This document describes the variables of the output data set and summaries used to calculate the values, along with units and any other relevant information.
The first section summarizes relevant parts from the input data codebook. The next section describes the output variables with their units. The last section describes the transformations used to calculate the output values.
The original data set contains a codebook that describes the data set. In this section I repeat the parts of the codebook relevant to the generated data set.
The input data contains data recorded from the accelerometer and the gyroscope of a smartphone while the person who carried the smartphone was performing one of the following six activities: WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING.
From the raw measurements, several other values were derived. The input dataset contains data for the following variables:
| Variable | source | domain |
|---|---|---|
| tBodyAcc-XYZ | accelerometer | time |
| tGravityAcc-XYZ | accelerometer | time |
| tBodyAccJerk-XYZ | accelerometer | time |
| tBodyAccMag | accelerometer | time |
| tGravityAccMag | accelerometer | time |
| tBodyAccJerkMag | accelerometer | frequency |
| fBodyAcc-XYZ | accelerometer | frequency |
| fBodyAccJerk-XYZ | accelerometer | frequency |
| fBodyAccMag | accelerometer | frequency |
| fBodyAccJerkMag | accelerometer | frequency |
| tBodyGyro-XYZ | gyroscope | time |
| tBodyGyroJerk-XYZ | gyroscope | time |
| tBodyGyroMag | gyroscope | time |
| tBodyGyroJerkMag | gyroscope | time |
| fBodyGyro-XYZ | gyroscope | frequency |
| fBodyGyroMag | gyroscope | frequency |
| fBodyGyroJerkMag | gyroscope | frequency |
The table above also shows the source for each variable and the domain it belongs to. Variable names ending with XYZ denote 3-axial variables. The dataset contains one separate variable for each axis.
The original sensor data was recorded at a rate of 50 Hz. The data was grouped into fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). Several functions where applied to these windows to compute the input features. The functions include:
- mean(): Mean value
- std(): Standard deviation
- mad(): Median absolute deviation
- and many others.
In total, each input record contains 651 features.
The accelerometer data is measured in standard gravity units 'g'. The gyroscope data is measured in radians/second.
The input features are normalized and bounded within the interval [-1,1]. During normalization the units have been divided by themselves so units have been cancelled.
The input data is split into two subsets: a training set and a test set. Each subsets consists of three files:
- A features file (train/X_train.txt and test/X_test.txt) with one feature vector per row
- A label with one activity label per row (train/y_train.txt and test/y_test.txt)
- A file with a subject id per row (train/subject_train.txt and test/subject_test.txt)
The names of the features are listed in the file features.txt.
Additionally, the input contains the file activity_labels.txt which links the class labels with their activity name.
The input data set also contains the raw measurement data that was used to compute the features. This raw data is not beeing used in this project.
The following table shows a table with all variables in the output file. The first column in the table shows the column number of the variable, the second column shows the variable name and column three shows the source variable used to compute the value of the variable.
| Colmun | Variable | source variable |
|---|---|---|
| 1 | subjectid | subject id from subjects file |
| 2 | activityname | activity id from labels file |
| 3 | timeBodyAccelerationMeanX | tBodyAcc-mean()-X |
| 4 | timeBodyAccelerationMeanY | tBodyAcc-mean()-Y |
| 5 | timeBodyAccelerationMeanZ | tBodyAcc-mean()-Z |
| 6 | timeBodyAccelerationStandarddeviationX | tBodyAcc-std()-X |
| 7 | timeBodyAccelerationStandarddeviationY | tBodyAcc-std()-Y |
| 8 | timeBodyAccelerationStandarddeviationZ | tBodyAcc-std()-Z |
| 9 | timeGravityAccelerationMeanX | tGravityAcc-mean()-X |
| 10 | timeGravityAccelerationMeanY | tGravityAcc-mean()-Y |
| 11 | timeGravityAccelerationMeanZ | tGravityAcc-mean()-Z |
| 12 | timeGravityAccelerationStandarddeviationX | tGravityAcc-std()-X |
| 13 | timeGravityAccelerationStandarddeviationY | tGravityAcc-std()-Y |
| 14 | timeGravityAccelerationStandarddeviationZ | tGravityAcc-std()-Z |
| 15 | timeBodyAccelerationJerkMeanX | tBodyAccJerk-mean()-X |
| 16 | timeBodyAccelerationJerkMeanY | tBodyAccJerk-mean()-Y |
| 17 | timeBodyAccelerationJerkMeanZ | tBodyAccJerk-mean()-Z |
| 18 | timeBodyAccelerationJerkStandarddeviationX | tBodyAccJerk-std()-X |
| 19 | timeBodyAccelerationJerkStandarddeviationY | tBodyAccJerk-std()-Y |
| 20 | timeBodyAccelerationJerkStandarddeviationZ | tBodyAccJerk-std()-Z |
| 21 | timeBodyGyroscopeMeanX | tBodyGyro-mean()-X |
| 22 | timeBodyGyroscopeMeanY | tBodyGyro-mean()-Y |
| 23 | timeBodyGyroscopeMeanZ | tBodyGyro-mean()-Z |
| 24 | timeBodyGyroscopeStandarddeviationX | tBodyGyro-std()-X |
| 25 | timeBodyGyroscopeStandarddeviationY | tBodyGyro-std()-Y |
| 26 | timeBodyGyroscopeStandarddeviationZ | tBodyGyro-std()-Z |
| 27 | timeBodyGyroscopeJerkMeanX | tBodyGyroJerk-mean()-X |
| 28 | timeBodyGyroscopeJerkMeanY | tBodyGyroJerk-mean()-Y |
| 29 | timeBodyGyroscopeJerkMeanZ | tBodyGyroJerk-mean()-Z |
| 30 | timeBodyGyroscopeJerkStandarddeviationX | tBodyGyroJerk-std()-X |
| 31 | timeBodyGyroscopeJerkStandarddeviationY | tBodyGyroJerk-std()-Y |
| 32 | timeBodyGyroscopeJerkStandarddeviationZ | tBodyGyroJerk-std()-Z |
| 33 | timeBodyAccelerationMagnitudeMean | tBodyAccMag-mean() |
| 34 | timeBodyAccelerationMagnitudeStandarddeviation | tBodyAccMag-std() |
| 35 | timeGravityAccelerationMagnitudeMean | tGravityAccMag-mean() |
| 36 | timeGravityAccelerationMagnitudeStandarddeviation | tGravityAccMag-std() |
| 37 | timeBodyAccelerationJerkMagnitudeMean | tBodyAccJerkMag-mean() |
| 38 | timeBodyAccelerationJerkMagnitudeStandarddeviation | tBodyAccJerkMag-std() |
| 39 | timeBodyGyroscopeMagnitudeMean | tBodyGyroMag-mean() |
| 40 | timeBodyGyroscopeMagnitudeStandarddeviation | tBodyGyroMag-std() |
| 41 | timeBodyGyroscopeJerkMagnitudeMean | tBodyGyroJerkMag-mean() |
| 42 | timeBodyGyroscopeJerkMagnitudeStandarddeviation | tBodyGyroJerkMag-std() |
| 43 | frequencyBodyAccelerationMeanX | fBodyAcc-mean()-X |
| 44 | frequencyBodyAccelerationMeanY | fBodyAcc-mean()-Y |
| 45 | frequencyBodyAccelerationMeanZ | fBodyAcc-mean()-Z |
| 46 | frequencyBodyAccelerationStandarddeviationX | fBodyAcc-std()-X |
| 47 | frequencyBodyAccelerationStandarddeviationY | fBodyAcc-std()-Y |
| 48 | frequencyBodyAccelerationStandarddeviationZ | fBodyAcc-std()-Z |
| 49 | frequencyBodyAccelerationJerkMeanX | fBodyAccJerk-mean()-X |
| 50 | frequencyBodyAccelerationJerkMeanY | fBodyAccJerk-mean()-Y |
| 51 | frequencyBodyAccelerationJerkMeanZ | fBodyAccJerk-mean()-Z |
| 52 | frequencyBodyAccelerationJerkStandarddeviationX | fBodyAccJerk-std()-X |
| 53 | frequencyBodyAccelerationJerkStandarddeviationY | fBodyAccJerk-std()-Y |
| 54 | frequencyBodyAccelerationJerkStandarddeviationZ | fBodyAccJerk-std()-Z |
| 55 | frequencyBodyGyroscopeMeanX | fBodyGyro-mean()-X |
| 56 | frequencyBodyGyroscopeMeanY | fBodyGyro-mean()-Y |
| 57 | frequencyBodyGyroscopeMeanZ | fBodyGyro-mean()-Z |
| 58 | frequencyBodyGyroscopeStandarddeviationX | fBodyGyro-std()-X |
| 59 | frequencyBodyGyroscopeStandarddeviationY | fBodyGyro-std()-Y |
| 60 | frequencyBodyGyroscopeStandarddeviationZ | fBodyGyro-std()-Z |
| 61 | frequencyBodyAccelerationMagnitudeMean | fBodyAccMag-mean() |
| 62 | frequencyBodyAccelerationMagnitudeStandarddeviation | fBodyAccMag-std() |
| 63 | frequencyBodyAccelerationJerkMagnitudeMean | fBodyBodyAccJerkMag-mean() |
| 64 | frequencyBodyAccelerationJerkMagnitudeStandarddeviation | fBodyBodyAccJerkMag-std() |
| 65 | frequencyBodyGyroscopeMagnitudeMean | fBodyBodyGyroMag-mean() |
| 66 | frequencyBodyGyroscopeMagnitudeStandarddeviation | fBodyBodyGyroMag-std() |
| 67 | frequencyBodyGyroscopeJerkMagnitudeMean | fBodyBodyGyroJerkMag-mean() |
| 68 | frequencyBodyGyroscopeJerkMagnitudeStandarddeviation | fBodyBodyGyroJerkMag-std() |
- Column 1 holds the unmodified subject id from the input data.
- Column 2 holds a readable activity name for the activity id in the lables file
- The remaining columns hold the mean of all values for the source variable for a given subject and activity.
The source variables consist of the following parts:
- The first letter defines the domain: t for time or f for frequency
- The next part defines the source variable: BodyAcc (BodyAcceleration), GravityAcc (GravityAcceleration) or BodyGyro (BodyGyroscope)
- The next (optional) part describes the method that was used to derive the variable: Mag (Magnitued), Jerk or JerkMag (JerkMagnitude)
- The part after the dash describes the function that was used to compute the feature mean (Mean) or std (Standarddeviation)
- In case of three axial measurements, the last part of the variable defines the axis (X, Y or Z)
- subjectid: identifier of an observed volunteer within an age bracket of 19-48 years
- Data type: Numeric
- Value range: 1 - 30
- activityname: Label string of the observed activity a person was performing
- Data type: factor
- Labels: WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING
- Rows 3 - 68: Means of selected features per subjectid and activityname
- Data type: The feature means have the same units as their input data. Due to the normalization they do not have units attached see Input Data Units
- Value range: [-1, 1]
- Some input variable names contain the string 'BodyBody'. I considered this to be a typo and transformed it to the string 'Body'.
- The original input data contained duplicated variable names. This does not affect our script since the features with duplicated names are not selected as source for the output data.
To compute the mean values for the features, the input data was grouped by column 1 and column 2. Afterwards the mean for each feature variable per group was computed.
The activity ids were converted to readable activity descriptions by loading the activity descriptions from the file "activity_labels.txt" provieded with the input data.