+ - 0:00:00
Notes for current slide
Notes for next slide

Ethics as a Data Scientist

Data Visualisation and Analytics

Anastasios Panagiotelis and Lauren Kennedy

Lecture 04

1
2

Why does ethical practice matter?

  • Legal responsibility
3

Why does ethical practice matter?

  • Legal responsibility
  • Institutional requirements
3

Why does ethical practice matter?

  • Legal responsibility
  • Institutional requirements
  • Moral & Social responsibility
3

Why does ethical practice matter?

  • Legal responsibility
  • Institutional requirements
  • Moral & Social responsibility
3

How to consider

  • Not about following a series of rules
4

How to consider

  • Not about following a series of rules
  • About having a discussion about risk, risk mitigation and benefits
4

How to consider

  • Not about following a series of rules
  • About having a discussion about risk, risk mitigation and benefits
  • Not a single decision, but ongoing conversation
4

How to consider

  • Not about following a series of rules
  • About having a discussion about risk, risk mitigation and benefits
  • Not a single decision, but ongoing conversation
4

Things to consider

  • Data collection
5

Things to consider

  • Data collection
  • Data storage
5

Things to consider

  • Data collection
  • Data storage
  • Analysis
5

Things to consider

  • Data collection
  • Data storage
  • Analysis
  • Communication
5

Things to consider

  • Data collection
  • Data storage
  • Analysis
  • Communication
  • Deployment
5

Things to consider

  • Data collection
  • Data storage
  • Analysis
  • Communication
  • Deployment
5

Data collection

6

Data collection

  • Often something that happens before data scientist/statisticians are involved in the project
7

Data collection

  • Often something that happens before data scientist/statisticians are involved in the project
  • Can determine the structure and nature of the data we recieve
7

Data collection

  • Often something that happens before data scientist/statisticians are involved in the project
  • Can determine the structure and nature of the data we recieve
7

Informed consent

  • Do participants understand how the data will be used?
8

Informed consent

  • Do participants understand how the data will be used?
  • Are participants capable of giving consent (e.g., children)
8

Informed consent

  • Do participants understand how the data will be used?
  • Are participants capable of giving consent (e.g., children)
  • Is there coersion?
8

Informed consent

  • Do participants understand how the data will be used?
  • Are participants capable of giving consent (e.g., children)
  • Is there coersion?
  • Do participants understand the risks?
8

Informed consent

  • Do participants understand how the data will be used?
  • Are participants capable of giving consent (e.g., children)
  • Is there coersion?
  • Do participants understand the risks?
  • What happens to any incentive if participants drop out halfway through
8

Collection bias

  • Who has the potential to be represented in the data?
9

Collection bias

  • Who has the potential to be represented in the data?
  • How was the data collected?
9

Collection bias

  • Who has the potential to be represented in the data?
  • How was the data collected?
  • What are the limitations in the data collection method?
9

Collection bias

  • Who has the potential to be represented in the data?
  • How was the data collected?
  • What are the limitations in the data collection method?
  • Will the collection method limit generalization?
9

Limit exposure

  • Only collect the data you need
10

Limit exposure

  • Only collect the data you need
  • Do you need to collect contact information?
    • What happens if you need to contact someone (e.g., test results)
10

Limit exposure

  • Only collect the data you need
  • Do you need to collect contact information?
    • What happens if you need to contact someone (e.g., test results)
  • If you do collect personal information, does it need to be linked to the rest of the data?
10

Data storage

11

Data security

  • Often determined by the data stakeholders (generally who collects the data, but not always)
12

Data security

  • Often determined by the data stakeholders (generally who collects the data, but not always)
  • If you didn't collect the data, and haven't been told how to store it, ask whoever gave it you
12

Data security

  • Often determined by the data stakeholders (generally who collects the data, but not always)
  • If you didn't collect the data, and haven't been told how to store it, ask whoever gave it you
  • As a general rule, if you don't have permission to share someone else's data, don't.
12

Data security

  • Often determined by the data stakeholders (generally who collects the data, but not always)
  • If you didn't collect the data, and haven't been told how to store it, ask whoever gave it you
  • As a general rule, if you don't have permission to share someone else's data, don't.
12

Data storage

  • Try not to store even public data on sites like Github.
    • Public, freely available data often require consent/agreement to particular stewardship practices
    • Instead provide directions on how to download, and when you downloaded
13

Data storage

  • Try not to store even public data on sites like Github.
    • Public, freely available data often require consent/agreement to particular stewardship practices
    • Instead provide directions on how to download, and when you downloaded
  • If data is sent to you with a type of procedure, use this encryption as a default for all data transfer in future
    • for example encrypted zip file with password sent via text
13

Right to be forgotten

14

Data retention plan

  • Is there a plan to archive the data for replication of results? How will you ensure that the meaning of the data and any idiosyncracies be recorded?
15

Data retention plan

  • Is there a plan to archive the data for replication of results? How will you ensure that the meaning of the data and any idiosyncracies be recorded?
  • How will you protect the security of the data in the future? Who will be responsible? Do you have a plan for changing archival procedures as technology changes (e.g., storing on floppy disks)
15

Data retention plan

  • Will data be deleted after a certain period of time? If so, who is responsible for doing that? Who confirms that it has been completed adequately?
16

Data retention plan

  • Will data be deleted after a certain period of time? If so, who is responsible for doing that? Who confirms that it has been completed adequately?
  • Will the data be kept for future analyses? Did the individuals consent to this analysis? How will permission be granted?
16

Data fairness

17

Data

  • Sampling bias (where your sample doesn't represent the population you're interested in)
18

Data

  • Sampling bias (where your sample doesn't represent the population you're interested in)
  • Selective labels (where there is more measurement error for some parts of your sample than others)
18

Data

  • Sampling bias (where your sample doesn't represent the population you're interested in)
  • Selective labels (where there is more measurement error for some parts of your sample than others)
  • Systematic Error (when societal differences can make the conclusions you draw from data)
18

Model

  • Modelling choices and justification
19

Model

  • Modelling choices and justification
  • Interpretability
19

Model

  • Modelling choices and justification
  • Interpretability
  • Evaluation
19

Communication

  • Do the analyses, summaries and visualizations accurately reflect the data?
20

Communication

  • Do the analyses, summaries and visualizations accurately reflect the data?
  • Are the results reported with reflection on limitations of the data and analysis?
20

Communication

  • Do the analyses, summaries and visualizations accurately reflect the data?
  • Are the results reported with reflection on limitations of the data and analysis?
  • Does the data/model/assumptions justify the generalizations made with the data?
20

Communication

  • Do the analyses, summaries and visualizations accurately reflect the data?
  • Are the results reported with reflection on limitations of the data and analysis?
  • Does the data/model/assumptions justify the generalizations made with the data?
  • Have we provided our results with sufficient detail to communicate the decisions made in conducting the analysis (reproducibility?)
20

Communication

  • Do the analyses, summaries and visualizations accurately reflect the data?
  • Are the results reported with reflection on limitations of the data and analysis?
  • Does the data/model/assumptions justify the generalizations made with the data?
  • Have we provided our results with sufficient detail to communicate the decisions made in conducting the analysis (reproducibility?)
  • Have we sought feedback from relevant members of the communities which our results are likely to effect?
20

Deployment

21

Deployment

  • Do you have a plan for halting the use of your model once it is in production?
22

Deployment

  • Do you have a plan for halting the use of your model once it is in production?
  • For models that will be updated - do we have a plan to evaluate whether the algorithm will become less fair?
22

Deployment

  • Do you have a plan for halting the use of your model once it is in production?
  • For models that will be updated - do we have a plan to evaluate whether the algorithm will become less fair?
  • For models that will not be updated - do we clearly state when the results generalize to or have a plan for identifying when the results no longer change (e.g., due to society changing?)
22

Deployment

  • Do you have a plan for halting the use of your model once it is in production?
  • For models that will be updated - do we have a plan to evaluate whether the algorithm will become less fair?
  • For models that will not be updated - do we clearly state when the results generalize to or have a plan for identifying when the results no longer change (e.g., due to society changing?)
22

Resources

23

Resources

24

Resources

24

Resources

24

Resources

24
2
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow