Compare means
Compute the means of a variable in different groups of a data table.
Details
The arithmetic mean of a set of values is calculated by summing the values together and dividing the result by the total number of values. The arithmetic mean is often used to summarise the central tendency of data but can be greatly skewed by outliers in the data. This module has a trim parameter which symmetrically removes a proportion of the data from each end after ordering. This can be useful to remove a small set of outliers for the calculation where a central set of data are of interest. Other more robust statistics, such as the median, may also be of interest in these cases.
Splitting the data into different groups based on the values in a categorical column of the data allows comparison of means for different subsets of data. The standard deviation can also be calculated as a measure of how close values in the data tend to be to the mean. A low standard deviation indicates that values in the data tend to be close to the mean, while a higher standard deviation indicates that the data are spread over a larger range of values. A confidence interval can also be calculated for the mean if the data represent a sample of some larger population. The calculation for this confidence interval assumes the t-distribution.
Output
The example below shows how to compute arithmetic means of CO2 uptake in different plant species using the Compare Means module. The output also includes standard deviation, standard error and the 95% confidence interval.
Parameters
Variable name | Required | Constraints | Description |
---|---|---|---|
test_var | Yes | Column Input: Numeric, Integer | The column with numeric values from which to find the mean. |
group_var | Yes | Column Input: Text, Integer, Boolean, Date, DateTime | A categorical column from which to create groups to compare. |
trim | Yes | Decimal between 0 and 0.5 | The proportion of observations to be trimmed from each end of the values in each group of test_var before computing the results. Default 0. |
remove_na | Yes | Boolean | Whether to remove missing NA values from the data before computing the results. |
standard_deviation | Yes | Boolean | Whether to also calculate the standard deviations for each group of the selected column. |
confidence_interval | No | Decimal between 0.5 and 1 | If specified then the standard error and a confidence interval will be calculated for each mean. |
round_digits | No | An Integer between -10 and 10 | The number of decimal places to round the results to. Negative values indicate rounding to powers of 10 e.g. -2 will round to the nearest 100. |