Ethics as a Data ScientistData Visualisation and AnalyticsAnastasios Panagiotelis and Lauren KennedyLecture 041

Why does ethical practice matter?Legal responsibility 
3

Why does ethical practice matter?Legal responsibility 
Institutional requirements
3

Why does ethical practice matter?Legal responsibility 
Institutional requirements
Moral & Social responsibility 
3

Why does ethical practice matter?Legal responsibility 
Institutional requirements
Moral & Social responsibility 
3

How to considerNot about following a series of rules
4

How to considerNot about following a series of rules
About having a discussion about risk, risk mitigation and benefits
4

How to considerNot about following a series of rules
About having a discussion about risk, risk mitigation and benefits
Not a single decision, but ongoing conversation
4

How to considerNot about following a series of rules
About having a discussion about risk, risk mitigation and benefits
Not a single decision, but ongoing conversation
4

Things to considerData collection
5

Things to considerData collection
Data storage
5

Things to considerData collection
Data storage
Analysis
5

Things to considerData collection
Data storage
Analysis
Communication
5

Things to considerData collection
Data storage
Analysis
Communication
Deployment
5

Things to considerData collection
Data storage
Analysis
Communication
Deployment
5

Data collection6

Data collectionOften something that happens before data scientist/statisticians are involved in the project
7

Data collectionOften something that happens before data scientist/statisticians are involved in the project
Can determine the structure and nature of the data we recieve
7

Data collectionOften something that happens before data scientist/statisticians are involved in the project
Can determine the structure and nature of the data we recieve
7

Informed consentDo participants understand how the data will be used?
8

Informed consentDo participants understand how the data will be used?
Are participants capable of giving consent (e.g., children)
8

Informed consentDo participants understand how the data will be used?
Are participants capable of giving consent (e.g., children)
Is there coersion? 
8

Informed consentDo participants understand how the data will be used?
Are participants capable of giving consent (e.g., children)
Is there coersion? 
Do participants understand the risks?
8

Informed consentDo participants understand how the data will be used?
Are participants capable of giving consent (e.g., children)
Is there coersion? 
Do participants understand the risks?
What happens to any incentive if participants drop out halfway through
8

Collection biasWho has the potential to be represented in the data?
9

Collection biasWho has the potential to be represented in the data?
How was the data collected?
9

Collection biasWho has the potential to be represented in the data?
How was the data collected?
What are the limitations in the data collection method?
9

Collection biasWho has the potential to be represented in the data?
How was the data collected?
What are the limitations in the data collection method?
Will the collection method limit generalization?
9

Limit exposureOnly collect the data you need
10

Limit exposureOnly collect the data you need
Do you need to collect contact information? What happens if you need to contact someone (e.g., test results)

10

Limit exposureOnly collect the data you need
Do you need to collect contact information? What happens if you need to contact someone (e.g., test results)

If you do collect personal information, does it need to be linked to the rest of the data? 
10

Data storage11

Data securityOften determined by the data stakeholders (generally who collects the data, but not always)
12

Data securityOften determined by the data stakeholders (generally who collects the data, but not always)
If you didn't collect the data, and haven't been told how to store it, ask whoever gave it you
12

Data securityOften determined by the data stakeholders (generally who collects the data, but not always)
If you didn't collect the data, and haven't been told how to store it, ask whoever gave it you
As a general rule, if you don't have permission to share someone else's data, don't.
12

Data securityOften determined by the data stakeholders (generally who collects the data, but not always)
If you didn't collect the data, and haven't been told how to store it, ask whoever gave it you
As a general rule, if you don't have permission to share someone else's data, don't.
12

Data storageTry not to store even public data on sites like Github. Public, freely available data often require consent/agreement to particular stewardship practices
Instead provide directions on how to download, and when you downloaded

13

Data storageTry not to store even public data on sites like Github. Public, freely available data often require consent/agreement to particular stewardship practices
Instead provide directions on how to download, and when you downloaded

If data is sent to you with a type of procedure, use this encryption as a default for all data transfer in futurefor example encrypted zip file with password sent via text

13

Right to be forgotten

General Data Protection Regulation in 2014 added a section on the right to be forgotten. Under this, any EU citizen has the right to request that personal data be removed. For more see here https://gdpr-info.eu/issues/right-to-be-forgotten/.

Data retention planIs there a plan to archive the data for replication of results? How will you ensure that the meaning of the data and any idiosyncracies be recorded?
15

Data retention planIs there a plan to archive the data for replication of results? How will you ensure that the meaning of the data and any idiosyncracies be recorded?
How will you protect the security of the data in the future? Who will be responsible? Do you have a plan for changing archival procedures as technology changes (e.g., storing on floppy disks)
15

Data retention planWill data be deleted after a certain period of time? If so, who is responsible for doing that? Who confirms that it has been completed adequately?
16

Data retention planWill data be deleted after a certain period of time? If so, who is responsible for doing that? Who confirms that it has been completed adequately?
Will the data be kept for future analyses? Did the individuals consent to this analysis? How will permission be granted?
16

Data fairness17

DataSampling bias
(where your sample doesn't represent the population you're interested in)
18

DataSampling bias
(where your sample doesn't represent the population you're interested in)
Selective labels
(where there is more measurement error for some parts of your sample than others)
18

DataSampling bias
(where your sample doesn't represent the population you're interested in)
Selective labels
(where there is more measurement error for some parts of your sample than others)
Systematic Error
(when societal differences can make the conclusions you draw from data)
18

ModelModelling choices and justification
19

ModelModelling choices and justification
Interpretability
19

ModelModelling choices and justification
Interpretability
Evaluation 
19

CommunicationDo the analyses, summaries and visualizations accurately reflect the data?
20

CommunicationDo the analyses, summaries and visualizations accurately reflect the data?
Are the results reported with reflection on limitations of the data and analysis?
20

CommunicationDo the analyses, summaries and visualizations accurately reflect the data?
Are the results reported with reflection on limitations of the data and analysis?
Does the data/model/assumptions justify the generalizations made with the data?
20

CommunicationDo the analyses, summaries and visualizations accurately reflect the data?
Are the results reported with reflection on limitations of the data and analysis?
Does the data/model/assumptions justify the generalizations made with the data?
Have we provided our results with sufficient detail to communicate the decisions made in conducting the analysis (reproducibility?)
20

CommunicationDo the analyses, summaries and visualizations accurately reflect the data?
Are the results reported with reflection on limitations of the data and analysis?
Does the data/model/assumptions justify the generalizations made with the data?
Have we provided our results with sufficient detail to communicate the decisions made in conducting the analysis (reproducibility?)
Have we sought feedback from relevant members of the communities which our results are likely to effect?
20

Deployment21

DeploymentDo you have a plan for halting the use of your model once it is in production?
22

DeploymentDo you have a plan for halting the use of your model once it is in production?
For models that will be updated - do we have a plan to evaluate whether the algorithm will become less fair?
22

DeploymentDo you have a plan for halting the use of your model once it is in production?
For models that will be updated - do we have a plan to evaluate whether the algorithm will become less fair?
For models that will not be updated - do we clearly state when the results generalize to or have a plan for identifying when the results no longer change (e.g., due to society changing?)
22

DeploymentDo you have a plan for halting the use of your model once it is in production?
For models that will be updated - do we have a plan to evaluate whether the algorithm will become less fair?
For models that will not be updated - do we clearly state when the results generalize to or have a plan for identifying when the results no longer change (e.g., due to society changing?)
22

Resources23

Resources

Ethical practice checklist https://deon.drivendata.org/

Resources

Ethical practice checklist https://deon.drivendata.org/
GDPR copy https://gdpr-info.eu/

Resources

Ethical practice checklist https://deon.drivendata.org/
GDPR copy https://gdpr-info.eu/
The national statement on ethical conduct in human research https://www.nhmrc.gov.au/about-us/publications/national-statement-ethical-conduct-human-research-2007-updated-2018

Resources

Ethical practice checklist https://deon.drivendata.org/
GDPR copy https://gdpr-info.eu/
The national statement on ethical conduct in human research https://www.nhmrc.gov.au/about-us/publications/national-statement-ethical-conduct-human-research-2007-updated-2018

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Ethics as a Data Scientist

Data Visualisation and Analytics

Anastasios Panagiotelis and Lauren Kennedy

Lecture 04

Why does ethical practice matter?

Why does ethical practice matter?

Why does ethical practice matter?

Why does ethical practice matter?

How to consider

How to consider

How to consider

How to consider

Things to consider

Things to consider

Things to consider

Things to consider

Things to consider

Things to consider

Data collection

Data collection

Data collection

Data collection

Informed consent

Informed consent

Informed consent

Informed consent

Informed consent

Collection bias

Collection bias

Collection bias

Collection bias

Limit exposure

Limit exposure

Limit exposure

Data storage

Data security

Data security

Data security

Data security

Data storage

Data storage

Right to be forgotten

Data retention plan

Data retention plan

Data retention plan

Data retention plan

Data fairness

Data

Data

Data

Model

Model

Model

Communication

Communication

Communication

Communication

Communication

Deployment

Deployment

Deployment

Deployment

Deployment

Resources

Resources

Resources

Resources

Resources

Help