![]() ![]() ![]() |
Data management | ||
| |
|||
The following rules may seem to be common sense or perhaps unnecessary, but they all correspond to problems which we have encountered in practice. Data are expensive! It's a sobering exercise to calculate the total cost of a day of data collection - including salaries, travel costs, wages for support staff, etc. A field trip may well cost US$500 and result in 5 A4 pages of data; that's US$100 per page. At that price, data collection deserves a good deal of care. Data recording in the fieldUse a pre-printed data form. This is a good idea for any kind of survey, but is essential for recording quantitative data (counts, measurements, etc). An example of a data form for line transect surveys is shown below.
You can use a field notebook to record any casual natural history observations which don't belong on the data form, but all measurements should go directly on the form, not first into your notebook and transcribed later.
Use black, waterproof ink. Black shows up best when photocopying or scanning soiled data forms. Do not use pencil; you will never, ever want to erase field data! If you need to make a correction, cross out the incorrect figures and write them again; do this on the next row or in the 'Comments' column - see the example above. If any measurements are unusually large or small, note this in the 'Comments' column too. Enter the raw numbers: eg. if your rangefinder shows feet and inches then record feet and inches on the data form. Conversion to metres can be done later in the office; don't try to convert to metres in the field - you might make a mistake, and you won't be able to go back and correct it. If you are using GPS locations, write down the coordinates on the data form as well as recording a waypoint in the GPS receiver itself. You then still have the data even if something happens to the GPS unit, and you have an unambiguous record of which location belongs to which observation. At the end of the session, check that the heading is completed with date, location, observer's name, etc., including the time the survey ends. If more than one page is used, each page should have the proper heading: don't rely on clipping pages together, that won't survive the photocopier. There's nothing more frustrating than a bundle of data forms, half of which are marked "page 2" - and nothing else! Remember that information on the absence of animals is also important. A data form for a session where nothing was seen (or trapped or whatever) needs just as much care as one with lots of observations. Fill in the date, location, observer, etc., then write something like "No animals detected" across the form. Avoid arbitrary codes as far as possible, as they are too easy to get muddled up. Use local names, compass directions, up/downhill, etc., abbreviated if necessary. You are less likely to make mistakes with "Medalam river, North" than with "Site 7, transect B". If you go back to the same locations again - even if it's 10 years later - use the same names. (And if the old name was "Site 7, transect B", you can still use "Site 7 = Medalam, transect B = North".) Back in campCheck and copy the original data as soon as possible after collection, ie. the same day, or next morning if it's a night survey. Any changes or additions should be clearly distinguishable from the original data - use a different colour ink, or pencil. If any numbers are not clear, write them again in the 'Comments' column. If possible, get someone else to copy your data onto a fresh data form; they should ask if anything is not clear. Then check that they have done it correctly. You now have two clear, legible copies of the data. Keep them separate! Put them in separate packs when you trek out. Leave the copy with a colleague or field assistant. Make a photocopy at the first opportunity and take that home. Take the original to the office. Auditing data collectionThere have been cases where researchers have not done the field work as they claimed, but have simply faked the data. This usually comes to light sooner or later, but not until a good deal of time and money have been wasted. And then data produced by everyone else in the organization is viewed with suspicion. So it is in everybody's interest to have a system for independent checking of data collection. How the audit is done will depend on the type of survey. For nest or dung surveys a spot-check a few days after the survey can be done. In other cases, data may be collected specifically for audit purposes, for example, the names and addresses of local guides and porters. This information should be carefully preserved, but may not need to be entered into the computer for analysis. Keying data into a computerTranscribing data is a source of errors. Data should only need to be keyed in once, and any further processing should be automated. The computer format therefore needs to be 'software-friendly' as well as user-friendly. If you want to keep all the corrections, comments, etc in a computerized format, consider scanning the data forms and saving in .pdf format. Data should be entered in a spreadsheet, eg. in MS Excel®, OpenOffice Calc, or equivalent. A database using software such as MS Access, Base or Filemaker requires a huge amount of work to setup and maintain, and is only worthwhile for major projects. To facilitate data entry and checking, the order of columns in the spreadsheet should match the order on the data form. Columns can be cut and pasted later to change the order, if required. Be careful when entering dates into spreadsheets: see the rules here. GPS locations should be downloaded from the GPS receiver and copied into the spreadsheet, and then checked against the original data form to ensure that locations match observations. Check the datum setting of the GPS and/or your download software: it can make a difference of up to 500m. Coordinates can be converted from one datum to another, but not if your spreadsheet contains a mix of values with different datums. Printing the data in the form of a map will help in checking, even if the 'map' is only a scatterplot produced in Excel. Avoid mixing text and numbers in the same column, remembering that '>50' or '5?' or '25cm' count as text. Use a consistent method for indicating missing data: leave the cell blank, or enter '-' or 'na' ('not available'). Don't use '0' or '-1' or '9999999', even if those values are obviously wrong; if you subsequently calculate the mean of the column you will get a spurious value! There are various options for dealing with missing data in the analysis, but at the data entry stage, just make it clear that it is missing. And of course enter all observations even those which are incomplete. Get someone else to key the data into the spreadsheet, then print out the data (just the relevant rows, not the whole spreadsheet) and pass to the original observer to check against the original data form. Archiving and distributing dataIn most jurisdictions, the raw data collected belong to the organization financing the field work, and if government funding is involved data must be made available to other scientists. Funding agencies generally allow ample time for researchers to analyze and publish results, but you cannot consign the raw data to the recycle bin once the analysis is done. Data sets should be archived and distributed in non-proprietary formats, ie. NOT .xls or .ods or other spreadsheet formats. They should be converted to ASCII files (American Standard Code for Information Interchange), with either tab-separated (.txt) or comma-separated (.csv) formats. Hints on exporting to these formats from spreadsheet software are here.
For more on managing and curating data, see Gotelli and Ellison (2004), chapter 8. |
|||
|
|
Text by Mike Meredith, updated 14 September 2007 |
||
![]() ![]() |
|||