# 7.2.8 Statistical analysis

Because the survey variables are all Nominal, the range of possible statistical tests is restricted. The most appropriate is Contingency table analysis.

Contingency table analyses determine whether a relationship exists between two nominal variables. Other statistics (t-tests, regressions, means, correlation tests) apply to dependent variables that are continuous, that is, they are capable of taking on many different values with an obvious ordering to them like height, weight, income chemical concentration, sales, etc. Tests applied to continuous variables lose their validity with nominal variables that do not have an ordered, continuous property. [Abacus1996, p. 81].

Contingency tables (called cross-tabulations in SPSS) structure the data into a two-way table showing the groupings for each of two different variables. For instance, a contingency table for the variables of Society and Industry Category would show all the possible answers for Society on one table axis and all the possible answers for Industry Category on the other table axis. The cells of the table each show the number of observations for one combination of answers. For this survey this contingency table looks like table 7-4.

###### Table 7-4: Observed Frequencies for Society, Industry Category (N=1038)

Blank

Combined

Consultant

Education

Government

Industry

Other

Totals

Email

1

4

22

239

10

36

24

336

APA

3

12

168

171

43

18

71

486

APS

0

1

45

19

10

6

6

87

BPS

0

1

10

95

11

3

9

129

Totals

4

18

245

524

74

63

110

1038

Once a contingency table has been constructed it is possible to examine the values to see which combinations of answers show more or less observations than would be expected if the two variables are independent. The statistical test to use in this case is the chi-square test for independence.

The hypothesis of independence states that the likelihood of an observation falling into one group for one variable is independent of the other group the observation falls into. To calculate this test, Statview finds the expected value for the number of observations for every combination of groups based on the hypothesis of independence and compares the expected with the observed values in each cell. [Abacus1996, p. 82].

The null hypothesis is that the two variables are independent. That is, a respondent who is from the BPS is no more likely to work in an educational institution than in other industry. Conversely, a respondent who works as a consultant is no more likely to be a member of the APA than any other society. If one calculates a low chi-square value (and a corresponding high probability indicated by the letter p) for a particular combination of two variables, one would tend to accept the null hypothesis.

If the null hypothesis is rejected (on the basis of a large chi-square value and corresponding low p value) then one would have identified a relationship between the two variables. One can then examine the contingency tables in more detail to identify particular combinations of variables where the expected number of observations is significantly different from that observed.

Statview provides both an expected values table and a table of post-hoc cell contributions to the overall chi-square statistic to assist with this. The expected values table shows the expected number of observations for every combination of groups based on the hypothesis of independence. Note that the chi-square test is not valid when the minimum expected value in any cell is less than five. Observed values in a cell can be lower than five without causing any problems.

The post-hoc cell contributions are

a form of standardized residual that indicate what each cell in the table contributes to the chi-square statistic. Since they are calculated to follow a standard normal distribution, absolute values greater than, for example, 1.96 for a 0.05 probability level indicate that the cell in question provides significant information about the combinations of groups of the variables whose occurrence is different than what would be expected under the hypothesis of independence. [Abacus1996, pp. 82-3].

The main application of the chi-square test to contingency tables was to determine whether it was possible to treat all the print survey subgroups as an aggregate. In order to test this, the dataset was first restricted to print replies only. Contingency tables were then calculated for each variable. The null hypothesis for each successive variable was that the Society variable (used to distinguish print subgroups from each other, as well as from the email survey) and the variable under consideration were independent. Chi-square values and probabilities were then calculated. (The Society variable as restricted to responses to the print survey only will be referred to as Society-Print from now on as a convenient shorthand). There are three possible outcomes from this contingency table analysis for the Society-Print responses.

If none of the expected cells in the contingency table for a given variable are less than 5 and the chi-square p-value is > 0.05 then the print subgroups show independence. The subgroups were therefore aggregated and treated as a block (relative to the email responses) for these questions. Variables in this category are T-CD Drive, T-Sound, and T-Colour.

If none of the expected cells in the contingency table for a given variable are less than 5 and the chi-square p-value is < 0.05 then the print subgroups show dependence with the Society variable. The variables in this group are T-Network, T-Modem, F-Subscribe, F-Web, F-CDROM, and F-Views. In this case, the response patterns for each subgroup are significantly different and they could not be treated as an aggregate for these questions.

For the remainder of the survey questions, at least one expected cell in the contingency table had a value of less than five. This means that the chi-square p value calculated for the disaggregated print survey subgroups was not valid and could not be used as the basis for analysis. For these variables, the print subgroups was also aggregated and compared as a block with the email survey responses. Variables in this category are Industry Category, Employee Role, T-PC, F-Ftp, F-Gopher, F-Psyche, F-Publishes, all the A(Advantages) variables and all the D (Disadvantages) variables.

For those cases where the contingency table analysis indicated it was reasonable or necessary to group the print subgroups together, there was a possibility that there might be statistically significant differences between the email and print surveys. To further explore this, contingency tables were constructed between the Survey variable and each of the technology variables. This provided a much better picture of the distribution of results between the various groups.

Thus, when discussing the results of the contingency tables, the variable being compared to the survey question under discussion will be one of two possibilities. If the analysis of the print subgroups has supported the hypothesis of independence, then the print subgroups can be aggregated and compared as a whole to the email survey. In this case the variable referred to will be Survey. If the contingency table analysis has not supported the hypothesis of independence between the three different psychological societies within the print survey, then the print subgroups cannot be aggregated and treated together. In this case the variable referred to will be Society, which distinguishes the three psychological societies and the email survey as four separate groups. In other words, the email survey is being treated as another 'society' for convenience of analysis.