Technology Department Team Goals and Status for Q1 FY19/20 in support of the Medium Term Plan (MTP) Priorities and Annual Plan for FY19/20
Analytics
Team Manager: Nuria Ruiz
- Make easier to understand the history of all Wikimedia projects
- Release Mediawiki History in JSON/CSV or mysql dump format  (the best dataset to measure content and contributors) task T208612  Partially done Partially done
 
- Release Mediawiki History in JSON/CSV or mysql dump format  (the best dataset to measure content and contributors) task T208612 
- Make easier to understand how Commons media is used across our projects.
- Work starting on mediarequests API to get statistics of view of individual Wikimedia images. task T210313   In progress In progress
 
- Work starting on mediarequests API to get statistics of view of individual Wikimedia images. task T210313  
- Increase Data Quality 
- Enthrophy-based alarms for data issues task T215863  Partially done work should be continued in Q2 Partially done work should be continued in Q2
 
- Enthrophy-based alarms for data issues task T215863 
- Increase Data Privacy and Security 
- Make kerberos infra prod ready. task T226089  In progress will continue into Q2 as well In progress will continue into Q2 as well
 
- Make kerberos infra prod ready. task T226089 
- Modern Event Platform 
- * Continue moving events from job queue to event gate main. task T211248  Done Done
- * Development work for kafka connect task T223626  Postponed to next quarter Postponed to next quarter
- * Schema Repository CI for convention and backwards compatibility enforcement  Done Done
 
- * Continue moving events from job queue to event gate main. task T211248 
- Operational Excellence. Increase Resilience of Systems
- * New zookeeper cluster for tier-2 task T217057  In progress In progress
 
- * New zookeeper cluster for tier-2 task T217057 
- Operational Excellence. Reduce Operational Load by Phasing Out Legacy Systems
- * Sunset MySQL data store for eventlogging. task T159170  In progress this quarter and next. In progress this quarter and next.
 
- * Sunset MySQL data store for eventlogging. task T159170 
Status
- July 23, 2019 -  In progress In progress- We will be moving any work that has to do with kafka connect to next quarter due to licensing issues. Thus marking as not done for this quarter.
- Migration of Events to EventGate main has been rolling out w/o issues.
- Work for mediarequests API started, probably API will be in service next quarter.
- Enthrophy-based alarms for data issues - is done for this quarter and will be picked up next quarter.
 
- August 2019 -  To do To do- We are waiting for survey responses to finalize the format of the mediawiki history release.
- Work continues on mediarequests API, we are working on backfilling this data on hive, once that is done we can move it to druid or cassandra so it can be served via API.
- Migration of events from job queue to event gate main ahead of schedule, no production issues.
 
- September 2019 -  To do To do- Overall we will be finishing much of the work in progress early next quarter
 
- Mediawiki history data is public, we are working on the hadoop process that will publish the files as current infrastructure on dumps is not sufficient to publish files fast enough
 
- New Mediarequest API work planned for this quarter is done, we have a deployment pending in order to make it public
- Slightly blocked on hardware for the zookeeper task, working with DC ops.
 
Core Platform
Team Manager: Corey Floyd
 To do - Kick off Front End Working Group to explore recommendations from the Q4 research and identify a project to begin working on in Q2 (PE, Reduce Complexity of the Platform) To do - Kick off Front End Working Group to explore recommendations from the Q4 research and identify a project to begin working on in Q2 (PE, Reduce Complexity of the Platform)
 To do - Build out platform infrastructure to support partner APIs to support better access and increased load (PE, Tech and Product Partnerships) To do - Build out platform infrastructure to support partner APIs to support better access and increased load (PE, Tech and Product Partnerships)
 To do - Develop Multi-DC storage solution(s) to hold the remaining content in the main stash in order to unblock the move to Multi-DC reads (Core) To do - Develop Multi-DC storage solution(s) to hold the remaining content in the main stash in order to unblock the move to Multi-DC reads (Core)
Dependencies on: Product (front end working group and API work) SRE and Performance (Multi-DC Mainstash)
Status
- July 25, 2019 -  In progress In progress- Kickoff of the working group is going well, started this week with Technology and Product
- Platform infrastructure build out - is currently  In progress and we are waiting on on going Parsoid work to be completed (deploying APIs) In progress and we are waiting on on going Parsoid work to be completed (deploying APIs)
- Multi-DC storage solutions is  In progress and figuring out possible alternate solutions In progress and figuring out possible alternate solutions
 
- August 2019 -  To do To do
- September 2019 -  To do To do
Fundraising Tech
Team Manager: Erika Bjune
 To do - Get India form to first 1 hour test and continue further development To do - Get India form to first 1 hour test and continue further development
 To do - Get recurring up-sell to first 1 hour test and continue further development To do - Get recurring up-sell to first 1 hour test and continue further development
 To do - Support ongoing fundraising activities To do - Support ongoing fundraising activities
Dependencies on: Advancement team, Dlocal, Ingenico
Status
- July 2019 -  In progress on all 3 points. In progress on all 3 points.
- August 21, 2019 -  In progress on all 3 points. In progress on all 3 points.
- September 2019 -  To do To do
Performance
Team Manager: Gilles Dubuc
Platform Evolution: Reduce complexity of the platform to make it easier for new developers to contribute.
 In progress - Improve the filtering of obsolete domains in GTIDs to avoid timeouts on GTID_WAIT. (get reviewed and merged) In progress - Improve the filtering of obsolete domains in GTIDs to avoid timeouts on GTID_WAIT. (get reviewed and merged)
 Done - Support Parsing Team with performance insights on Parsoid-php roll out. Done - Support Parsing Team with performance insights on Parsoid-php roll out.
 In progress - Reduce reliance on master-DB writes for RL file-dependency tracking (Multi-DC prep).T113916 In progress - Reduce reliance on master-DB writes for RL file-dependency tracking (Multi-DC prep).T113916
 Done - Audit use of CSS image-embedding (improve page-load time by reducing the size of stylesheets) T121730 Done - Audit use of CSS image-embedding (improve page-load time by reducing the size of stylesheets) T121730
 In progress - Figure out the right store to use for the main stash (dynamo? mcrouter?).  T212129 In progress - Figure out the right store to use for the main stash (dynamo? mcrouter?).  T212129
 Not done - Swift cleanup + WebP ramp up. T211661 Not done - Swift cleanup + WebP ramp up. T211661
Core: Maintain libraries for which Performance is currently responsible, evaluate libraries to determine if should be owned by another team and perform handoffs to other teams when possible.
 Done - [Ongoing] Support and maintenance of MediaWiki's object caching and data access components. Done - [Ongoing] Support and maintenance of MediaWiki's object caching and data access components.
 Done - [Ongoing] Support and maintenance of WebPageTest and synthetic testing infrastructure. Done - [Ongoing] Support and maintenance of WebPageTest and synthetic testing infrastructure.
 Done - [Ongoing] Support and maintenance of MediaWiki's ResourceLoader. Done - [Ongoing] Support and maintenance of MediaWiki's ResourceLoader.
 Done - [Ongoing] Support and maintenance of Fresnel. Done - [Ongoing] Support and maintenance of Fresnel.
 Done -  Support AbuseFilterCachingParser development. T156095 Done -  Support AbuseFilterCachingParser development. T156095
Core: We can quickly detect performance regressions and be able to better detect potential ones prior to deployment.
 Done - Add Grafana dashboard for WANObjectCache stats. T197849 Done - Add Grafana dashboard for WANObjectCache stats. T197849
Core: Create a culture of performance in Wikimedia
 Done - Write two performance topic blog posts. Done - Write two performance topic blog posts.
 Done - Line up interested speakers for a FOSDEM Web Performance devroom proposal. Done - Line up interested speakers for a FOSDEM Web Performance devroom proposal.
Dependencies on: SRE, CPT, Parsing
Status
- July 2019 -  In progress In progress
- August 2019 -  To do To do
- September 2019 -  To do To do
Release Engineering
Team Manager: Greg Grossmeier
Priority: Reduce complexity of the platform to make it easier for new developers to contribute.
 In progress - All applicable new and existing services (and partially MediaWiki) exist in the Deployment Pipeline In progress - All applicable new and existing services (and partially MediaWiki) exist in the Deployment Pipeline- Migrate restrouter  Done Done
- (Stretch): MobileContentService is now  In progress In progress
- (Stretch): Preparatory MediaWiki config clean-up & static loading work  To do To do
 
- Migrate restrouter 
 
 In progress - Actionable code health metrics are provided for code stewards In progress - Actionable code health metrics are provided for code stewards- Scope out requirements for a self-hosted version of SonarQube for our use is  Postponed Postponed
- Expand set of repositories covered by code health metrics (via SonarQube)  Done Done
 
- Scope out requirements for a self-hosted version of SonarQube for our use is 
 
 In progress - Provide a standardized local MediaWiki development environment In progress - Provide a standardized local MediaWiki development environment- Migrate local-charts to deployment-charts is - task T224935 -  In progress In progress
- Instantiate testing and linting of helm charts - task T217868 -  To do To do
- Preliminary work on a CLI for setup/management - task T224939 -  In progress In progress
 
- Migrate local-charts to deployment-charts is - task T224935 - 
 
Dependencies on: SRE, Code Health Metrics WG
Core: Developers have a consistent and dependable deployment service.
 In progress - Iteratively improve our deployment tooling, service, and processes. In progress - Iteratively improve our deployment tooling, service, and processes.- Streamline the Kibana -> Phab error reporting workflow (using client-side code, at first)  In progress In progress
 
- Streamline the Kibana -> Phab error reporting workflow (using client-side code, at first) 
 
 To do - Align developer services with SRE best practices. To do - Align developer services with SRE best practices.- Work with SRE to identify and implement needs of Phabricator and Gerrit (expected to last into Q2)  To do To do
 
- Work with SRE to identify and implement needs of Phabricator and Gerrit (expected to last into Q2) 
 
Dependencies on: SRE, Performance
Core: Maintain and improve the Continuous Integration and Testing services
 To do - Maintain CI and testing services To do - Maintain CI and testing services- Scope updated CI/testing KPIs  To do To do
- Set up an experimental elastic search instance to store and analyze CI logs and metrics  Stalled Stalled
 
- Scope updated CI/testing KPIs 
 
 In progress - Evaluate, select, and implement a new CI infrastructure. In progress - Evaluate, select, and implement a new CI infrastructure.- POCs of GitLab and Zuul3 systems (as well as argo); evaluate options  Partially done Partially done
- Document an implementable architecture for what we want in new CI  In progress In progress
 
- POCs of GitLab and Zuul3 systems (as well as argo); evaluate options 
 
Dependencies on: SRE/Others invested in CI architecture choices
Core: A clear set of unit, integration, and system testing tools is available for all supported engineering languages.
 In progress - Update the existing system test tooling and developer education. In progress - Update the existing system test tooling and developer education.- Update existing Selenium documentation (https://www.mediawiki.org/wiki/Selenium/Node.js)  Done Done
 
- Update existing Selenium documentation (https://www.mediawiki.org/wiki/Selenium/Node.js) 
 
Dependencies on: none.
Status
- July 25, 2019 -  In progress In progress- Migrate restrouter is  Done and is now in Services's team hands Done and is now in Services's team hands
- Some portions of SonarQube is not open sourced, so we're looking into options
- Streamline the Kibana -> Phab error reporting workflow – has a POC now and should be deployed soon
 
- Migrate restrouter is 
- August 27, 2019 -  In progress In progress- For the work to streamline the Kibana -> Phab error reporting workflow e're looking at deploying Phatality
- POC for GitLab is  Done and Zuul3 is Done and Zuul3 is Partially done Partially done
- Scope out requirements for a self-hosted version of SonarQube is no longer stalled. We have a strategy that will use a combination of self-hosted and cloud hosted depending on the data. Essentially, self-hosted open source version will not do branch level analysis. We don't believe that will keep us from using it for non-branch based analysis.
- Expand set of repositories covered by code health metrics (via SonarQube) – we will have three new extensions added by the end of this month, and adding 3-6 more next month.
- Set up an experimental elastic search instance to store and analyze CI logs and metrics: We met to discuss this under the "Data ^3" project and laid out some basic objectives for a POC, and this work will continue into next quarter.
- Update the existing system test tooling and developer education:
- We worked with the Core Platform team in the analysis and selection of a an integration test tool. The expectation is for the Quality and Test Engineering team to take responsibility for this tooling once a SET is in place.
- Code Health Metrics WG has spun off effort to separate existing MediaWiki Unit Tests from Integration tests (driven by WMDE)
 
 
- September 2019 -  In progress In progress- Actionable code health metrics are provided for code stewards
- Decided that prior to investigating self-hosting of SonarQube, we wanted asses the current perceived value. As such we will be interviewing teams that are currently using SonarQube/SonarCloud as part of the Code Health Pipeline.
- We've been incrementally adding new repos to the Code Health Pipeline in order to avoid overloading the CI. No issues so far. Looking to add all applicable repos by the end of Q2.
 
- A clear set of unit, integration, and system testing tools is available for all supported engineering languages.
- To date we've established a set of tools that are used across the organization for Unit and System level automated testing. The CPT team has evaluated and deployed an integration testing tool that we look to make available more broadly. However, due to lack of SET staffing, it's not likely going to happen in this FY. As the new Quality and Test Engineering team has been formed, we will be assessing the state of tools across other teams across the foundation.
- The Selenium documentation has been updated.
- Webdriver IO has been upgraded from 4 to 5 for Core. Will need to start planning the migration for the other repos.
 
 
- Actionable code health metrics are provided for code stewards
Research
Team Manager: Leila Zia
 Done - [P-O14-D4] Run a series of interviews, office hours, or surveys to gather volunteer editor community's input on citation needed template recommendations. The result of this work will inform the specifications of an API (to be developed) to surface citation needed recommendations as well as future directions for this research. task T228442 Done - [P-O14-D4] Run a series of interviews, office hours, or surveys to gather volunteer editor community's input on citation needed template recommendations. The result of this work will inform the specifications of an API (to be developed) to surface citation needed recommendations as well as future directions for this research. task T228442
 Done - [P-O14-D4] Complete the research on characterizing Wikipedia citation usage. (Why We Leave Wikipedia). This goal will continue in Q2 and depending on the submission results potentially in Q3. task T227790 Done - [P-O14-D4] Complete the research on characterizing Wikipedia citation usage. (Why We Leave Wikipedia). This goal will continue in Q2 and depending on the submission results potentially in Q3. task T227790
 Done - [W-O6-D3] Computer vision consultation as part of Structured Data on Commons task T228440 Done - [W-O6-D3] Computer vision consultation as part of Structured Data on Commons task T228440
 Postponed - [P-O14-D6] Building a pipeline for image classification based on Commons categories. task T228441 Postponed - [P-O14-D6] Building a pipeline for image classification based on Commons categories. task T228441
 Done - [P-O14-D4] Make substantial progress towards a comprehensive literature review about automatic detection of misinformation and disinformation on the Web. We expect this work to be completed in Q2 and inform the work in this direction in Q3+. task T229595 Done - [P-O14-D4] Make substantial progress towards a comprehensive literature review about automatic detection of misinformation and disinformation on the Web. We expect this work to be completed in Q2 and inform the work in this direction in Q3+. task T229595
 Done - [P-O14-D4] Understand patrolling on Wikipedia. A write-up describing how patrolling is being done on Wikipedia across the languages. This work may be extended further by understanding the patrolling on Wikipedia in the context of Wikipedia's interaction with other projects such as Wikidata, Wikimedia Commons, ... task T228817 Done - [P-O14-D4] Understand patrolling on Wikipedia. A write-up describing how patrolling is being done on Wikipedia across the languages. This work may be extended further by understanding the patrolling on Wikipedia in the context of Wikipedia's interaction with other projects such as Wikidata, Wikimedia Commons, ... task T228817
 Done - Conduct the analysis on reader surveys to understand the relation between demographics and the consumption of content on Wikipedia across languages. (Why We Read Wikipedia + Demographics). This research will be concluded in Q2 and we expect substantial progress in Q1: task T228279 Done - Conduct the analysis on reader surveys to understand the relation between demographics and the consumption of content on Wikipedia across languages. (Why We Read Wikipedia + Demographics). This research will be concluded in Q2 and we expect substantial progress in Q1: task T228279
 Done - Hiring and onboarding. We expect 1-2 scientists to join the team in Q1 and the onboarding work will need to happen. We also expect to open a position for an engineering position in the team. task T229259 Done - Hiring and onboarding. We expect 1-2 scientists to join the team in Q1 and the onboarding work will need to happen. We also expect to open a position for an engineering position in the team. task T229259
 Done - [T-O12-D3] Determine important features of articles w/r/t level of reader interest across different demographic groups (as motivation for what aspects a general article category model should capture): task T228319 Done - [T-O12-D3] Determine important features of articles w/r/t level of reader interest across different demographic groups (as motivation for what aspects a general article category model should capture): task T228319
 Done - Wrap up editor gender work: task T227793 Done - Wrap up editor gender work: task T227793
Dependencies on: Product, Community Liaisons, and Structured Data teams
Status
- July 23, 2019 -  In progress notes: In progress notes:- Complete the research on characterizing Wikipedia citation usage -- bulk of the work will be done in Q1 and Q2, and submitted in Q3.
- Computer vision consultation as part of Structured Data on Commons -- more continued work on this, deadline is end of calendar year, currently waiting on word from Product on direction.
- Building a pipeline for image classification based on Commons categories -- this work is ongoing through this quarter and next.
- Comprehensive literature review about automatic detection of misinformation and disinformation -- this work will go on, but is not sustainable long term without addition of headcount for the team.
- Analysis on reader surveys to understand the relation between demographics and the consumption of content -- we hope to present this at Wikimania 2019
 
- August 27, 2019 -  In progress. All goals are on track to be met by the end of September (quarter). In progress. All goals are on track to be met by the end of September (quarter).
- September 2019 -  To do To do
Scoring Platform
Team Manager: Aaron Halfaker
 To do - Build out the Jade API to support user-actions To do - Build out the Jade API to support user-actions
 To do - Build/improve models in response to community demand To do - Build/improve models in response to community demand
 To do - Support operations infrastructure improvements (k8s, redis SPOF) To do - Support operations infrastructure improvements (k8s, redis SPOF)
Dependencies on: SRE
Status
- July 2019 -  To do. To do.
- August 2019 -  To do To do
- September 2019 -  To do To do
Search Platform
Team Manager: Guillaume Lederrey
Reduce complexity of the platform: Reduce technical debt and increase automation to reduce workload and make it easier to add new search features
- Refactor query highlighting to make it extensible by other extensions task T190130  In progress In progress
- Refactor Mjolnir jobs into separate smaller jobs  In progress In progress
 
- Refactor query highlighting to make it extensible by other extensions task T190130 
Core work: Maintain CirrusSearch and the Search API and WDQS
- Core maintenance work (always  In progress) In progress)
- Improve WDQS updater performance by writing custom code for updates task T212826  In progress In progress
- Full data reimport for WDQS to enable optimizations that were done last quarter  Done Done
- Work through the backlog of bugs and performance improvements for WDQS with our contractor  Done Done
- Start the hiring process for a new WDQS Engineer  Done Done
- Hardware renewal: replace elastic1017-1031 task T226843  In progress In progress
 
- Core maintenance work (always 
Continue to identify and enable machine learning and natural language processing techniques to improve the quality of search
- "Did you mean" suggestions: deploy method0 to production  In progress In progress
 
- "Did you mean" suggestions: deploy method0 to production 
Underserved communities benefit from search techniques that to date are only used on big wikis like machine learning–aided ranking, word embeddings or language specific analyzers: Language analysis / Phab work
- Work on highest priority language tickets (Discovery Search board / Language Stuff—always  In progress) In progress)
 
- Work on highest priority language tickets (Discovery Search board / Language Stuff—always 
Structured Data on Commons support (as needed)
- RDF export task T221916  In progress In progress
- Address the indexing issues of MediaInfo (labels vs descriptions) task T226722  Done Done
 
- RDF export task T221916 
Dependencies on: RDF export: WMDE / Wikidata, Hardware renewal: DC Ops, MediaInfo indexing: SDoC
Status
- July 30, 2019 -  In progress In progress- Hiring process is in full swing for WDQS engineer – lots of folks applying!
- Hardware renewal is  In progress and we're getting quotes In progress and we're getting quotes
 
- August 27, 2019 -  In progress In progress- Refactor query highlighting is still  In progress with lots of patches being uploaded (Phab ticket added above) In progress with lots of patches being uploaded (Phab ticket added above)
- Refactor Mjolnir jobs into separate smaller jobs was slow going the last month, but we should be able to tackle it in Sept.
- Improve WDQS updater performance by writing custom code for updates should be done by end of Sept
- We'll be taking another look at the backlog of bugs and performance improvements for WDQS this week,  In progress In progress
- Hardware renewal is ongoing, we're waiting on them to be racked and set up
- Language work is continuing:
- Slovak diacriticless search is waiting for community feedback T223787  In progress In progress
- highlighting for CirrusSearch results now respects grapheme clusters T35242  Done Done- related patches for JQuery and OOUI libraries are awaiting feedback (same task)  In progress In progress
 
- related patches for JQuery and OOUI libraries are awaiting feedback (same task) 
- Improvements to Khmer searching are ongoing T185721  In progress In progress
 
- Slovak diacriticless search is waiting for community feedback T223787 
- RDF export is also still  In progress with patches being tested and merged when good; we're working with WMDE on this In progress with patches being tested and merged when good; we're working with WMDE on this
 
- Refactor query highlighting is still 
- September 2019 -  In progress In progress- Refactoring query highlighting— In progress, but expect to be done (last patch in progress) In progress, but expect to be done (last patch in progress)
- Refactoring Mjolnir— In progress almost done, expect 1-2 weeks into Q2 In progress almost done, expect 1-2 weeks into Q2
- Improve WDQS updater performance—still  In progress In progress
- Work through the backlog of WDQS bugs— Done (bugs got worked) Done (bugs got worked)
- Hiring process for WDQS Engineer— Done (process started) Done (process started)
- Hardware renewal— In progress delivered but not racked, will spill into Q2 In progress delivered but not racked, will spill into Q2
- DYM, deploy M0— In progress A/B test will run this week In progress A/B test will run this week
- RDF export—unclear who owns it
 
- Refactoring query highlighting—
Security
Team Manager: John Bennett
Core
 In progress - Finalize and publish service catalog In progress - Finalize and publish service catalog
 In progress - Draft new employee security awareness content In progress - Draft new employee security awareness content
 To do - Create initial set of security measurements and metrics To do - Create initial set of security measurements and metrics
 In progress - Create initial version of PHP security toolkit In progress - Create initial version of PHP security toolkit
 Stalled - Create design document for how DAST will work Stalled - Create design document for how DAST will work
 Done - Create team learning circles Done - Create team learning circles
 In progress - Publication of security team roadmap In progress - Publication of security team roadmap
 To do - Release of Phan 2.x To do - Release of Phan 2.x
 To do - Security release To do - Security release
 To do - Bug Bounty SOP To do - Bug Bounty SOP
 To do - Deploy StopForumSpam To do - Deploy StopForumSpam
 In progress - Draft 3 new security policies In progress - Draft 3 new security policies
 In progress - Draft 3 new Security Incident Response playbooks In progress - Draft 3 new Security Incident Response playbooks
 To do - Socialize Corrective Action plan for Security Incidents To do - Socialize Corrective Action plan for Security Incidents
 Stalled - Incident response Table Top and updates to security after action reports and improvement plans Stalled - Incident response Table Top and updates to security after action reports and improvement plans
 To do - Discovery ticket for ElastAlert detection and alerting To do - Discovery ticket for ElastAlert detection and alerting
 Stalled - Phishing Security Awareness, at least 2 completed Phishing campaigns Stalled - Phishing Security Awareness, at least 2 completed Phishing campaigns
 Done - Team retro, implement agile ceremonies for appsec related projects Done - Team retro, implement agile ceremonies for appsec related projects
 In progress - Publish data protection and retention guidelines In progress - Publish data protection and retention guidelines
 In progress - Create privacy engineering charter In progress - Create privacy engineering charter
 In progress - Update data classification policy In progress - Update data classification policy
 In progress - Publication of privacy review template In progress - Publication of privacy review template
Dependencies on: New employee security awareness needs OIT onboarding and new account process integration.
Status
- July 25, 2019 -  In progress In progress- Draft service catalog and 4-5 service descriptions being drafted and schedule for release at the end of the Q
- New employee security awareness content will bolt on to the OIT new employee process. Content being prepared, hope to deploy this quarter.
- Initial measurements around the number of concept reviews for both appsec and privacy engineering will be collected this quarter.
- Ongoing work in the creation of some appsec automation via PHP security toolkit
- Ongoing work and investigation on how a DAST solution could fit into our appsec pipeline
- Security team roadmap is being built in Asana and will be published on office wiki this quarter.
- Lots of work in the data protection and privacy engineering space.
 
- August 26, 2019 -  In progress In progress- Draft service descriptions and catalog created, on target for publication at the end of the quarter
- We will be sidelining DAST work for this quarter due to bandwidth
- learning circles/skill matrix created
- Draft version of roadmap created, on target for publication at the end of the quarter
- Security table top exercise will be sidelined this quarter to work on corrective action plan and playbooks
- Phishing campaign stalled and will likely be abandoned this quarter
- Team retro completed and monthly appsec retro created.
- All privacy related work currently on target.
 
- September 2019 -  To do To do
Site Reliability Engineering
Directors: Mark Bergsma and Faidon Liambotis
Cross-cutting
 Partially done Firefighting improvements, ONFIRE (continuation) Partially done Firefighting improvements, ONFIRE (continuation)- Produce a standardized template for a status document for ongoing major incidents   Done Done
- Iterate on a process for running the incident documentation review board; review 90% of incident documents written this quarter  Partially done Partially done
- [stretch] Research possible implementations for synchronizing team contact information to everyone's phone  To do To do
 
- Produce a standardized template for a status document for ongoing major incidents  
 
 Done Database automation (continuation) Done Database automation (continuation)- Productionize dbctl (deploy, import data, set up alerts)
- Set up MediaWiki to optionally read the database configuration from etcd
- Gradually migrate all MediaWiki instances to read the database configuration from etcd
 
 
Service Operations
Team Manager: Mark Bergsma
 In progress Complete the transition to PHP 7 in production In progress Complete the transition to PHP 7 in production- Move all application server & API traffic to PHP 7  In progress In progress
- Move maintenance scripts to PHP 7  Done Done
- Move jobrunners to PHP 7  Done Done
- [stretch] Remove HHVM from production  To do To do
 
- Move all application server & API traffic to PHP 7 
 
 In progress Self-service Deployment Pipeline In progress Self-service Deployment Pipeline- Define and document the process for service owners to deploy a new service onto the pipeline  To do To do
- Support migration of services RESTrouter, wikifeeds by service owners  In progress In progress
 
- Define and document the process for service owners to deploy a new service onto the pipeline 
 
Dependencies on: Release Engineering, Core Platform, Performance
Data Persistence
Team Manager: Mark Bergsma
 Partially done Address Database infrastructure blockers on datacenter switchover Partially done Address Database infrastructure blockers on datacenter switchover- Order, rack and setup 10 new hosts in codfw  Done Done
- Failover all codfw masters  In progress In progress
- Failover eqiad masters to new hosts and decommission old masters  Partially done Partially done
- [stretch] Deploy codfw non-Mediawiki database proxies  Done Done
 
- Order, rack and setup 10 new hosts in codfw 
 
 In progress Strengthen backup infrastructure and support In progress Strengthen backup infrastructure and support- Deploy new Bacula hardware  In progress In progress
- Transfer ownership and knowledge of Bacula backup infrastructure
- [stretch] Migrate general backup service from old to new host(s)
 
- Deploy new Bacula hardware 
 
Traffic
Team Manager: Brandon Black
 To do Create usable TLS ciphersuite dashboard (continued) To do Create usable TLS ciphersuite dashboard (continued)- Decide on Prometheus vs Webrequest
- Send all the right data from the cp boxes upstream
- Make useful charts and graphs that can correlate ciphers to UA, Geo, ASN, etc.
 
 
 In progress Finish TLS deployment via ATS In progress Finish TLS deployment via ATS- Continuation of previous Q goal  In progress In progress
- Switch production edge TLS termination to ATS  In progress In progress
- [stretch] Support TLS1.3 {[to do}}
 
- Continuation of previous Q goal 
 
 Partially done ATS Backends: Test live cache_text traffic Partially done ATS Backends: Test live cache_text traffic- Implement basic TLS termination for cache_text services (may not be final solution w/ real PKI)  Partially done Partially done
- Begin testing a small fraction of live cache_text traffic through ATS backends  Done Done
 
- Implement basic TLS termination for cache_text services (may not be final solution w/ real PKI) 
 
 To do AuthDNS: Implement smooth geoip repooling solution To do AuthDNS: Implement smooth geoip repooling solution- Design new dynamic response architecture for future needs
- MVP/Draft code for geoip smooth repooling using above
- [stretch] release code, use in production
 
 
 Partially done Deploy anycast recdns to all production Partially done Deploy anycast recdns to all production- Finish evaluating current running implementation under live test  Done Done
- Implement any minor improvements we need  Done Done
- Switch most production hosts to using anycast recdns @ 10.3.0.1  In progress In progress
 
- Finish evaluating current running implementation under live test 
 
Infrastructure Foundations
Team Manager: Faidon Liambotis
 In progress Puppet 5 (continuation & wrap-up) In progress Puppet 5 (continuation & wrap-up)- Upgrade all production Puppetmasters to Puppet 5.5
- Upgrade production PuppetDB to 6.2 in both data centers
 
 
 In progress Configuration management for network operations In progress Configuration management for network operations- Productionize existing configuration management software (jnt)  In progress In progress
- Integrate with Netbox for device selection and topology data gathering  To do To do
- Add safe push method for the configuration: interactive and sequential  To do To do
- [stretch] Evaluate Netbox to store network secrets  Done Done
 
- Productionize existing configuration management software (jnt) 
 
 Partially done Bare metal cloud Partially done Bare metal cloud- Import existing management interfaces IPs into Netbox  Done Done
- Automate the assignment of new host's management interface IP  In progress In progress
- Automate the generation of management interface DNS records  In progress In progress
 
- Import existing management interfaces IPs into Netbox 
 
 In progress Identity Management & Single Sign On In progress Identity Management & Single Sign On- Build a production prototype of an Apereo CAS identity provider  Stalled Stalled
- Switch (at least) one service to authenticate against the identity provider  Stalled Stalled
 
- Build a production prototype of an Apereo CAS identity provider 
 
Observability
Team Manager: Faidon Liambotis
 Partially done Improve our alerting capabilities Partially done Improve our alerting capabilities- Produce and circulate an alerting infrastructure roadmap  Done Done
- Establish periodic alerts reviews, complete one by EOQ  In progress In progress
- Reduce Icinga alert noise  In progress for forever In progress for forever
 
- Produce and circulate an alerting infrastructure roadmap 
 
 In progress Tech debt: sunsetting of Graphite (part 1) In progress Tech debt: sunsetting of Graphite (part 1)- Deprecate statsd: fully migrate >= 30% of producers off statsd  In progress In progress
- [stretch] Deploy Thanos (long-term storage) stateless components: sidecar and query  To do To do
 
- Deprecate statsd: fully migrate >= 30% of producers off statsd 
 
Data Center Operations
Team Manager: Willy Pao
 Partially done Refine procurement process Partially done Refine procurement process- Improve average end-to-end turnaround time from hardware request to hardware delivery is  Partially done (always in progress) Partially done (always in progress)
- Tighten up procurement cycle by implementing regularly scheduled deadlines for quotes, approvals, and purchase orders  Partially done Partially done
- Implement general template form for service owners to fill in  Done Done
 
- Improve average end-to-end turnaround time from hardware request to hardware delivery is 
 
 In progress Improve turnaround times on repair/break-fix tasks In progress Improve turnaround times on repair/break-fix tasks- Implement a new hardware repair template & refine existing triaging processes  Partially done Partially done
- Enforce regular use of hardware troubleshooting runbook  To do To do
- Hire and on-board a contractor for additional support in eqiad  Done Done
- Identify 3rd party contractor to take care of straightforward tasks at remote caching sites  Done Done
 
- Implement a new hardware repair template & refine existing triaging processes 
 
 In progress Operational excellence: resolve all inventory inconsistencies In progress Operational excellence: resolve all inventory inconsistencies- Clean up existing backlog of Netbox inconsistencies and data errors  In progress In progress
- Keep all Netbox reports in a "passed" state  To do To do
- Maintain zero error reports going forward  To do To do
 
- Clean up existing backlog of Netbox inconsistencies and data errors 
 
 In progress Recycle all existing decommissioned hardware In progress Recycle all existing decommissioned hardware- Clear out existing decommissioned hardware in ulsfo and codfw  In progress In progress
- Determine alternative disposition company for Juniper equipment  To do To do
 
- Clear out existing decommissioned hardware in ulsfo and codfw 
 
Status
- July 23, 2019 -  In progress In progress- Complete the transition to PHP 7 in production is partially  Blocked currently Blocked currently
- Self-service Deployment Pipeline draft has been posted
- Refine procurement process is in testing right now (2 week cycle)
- Improve turnaround times on repair/break-fix tasks is also in progress with a new hire
- Recycle all existing decommissioned hardware is in progress with getting quotes for work to be done
 
- Complete the transition to PHP 7 in production is partially 
- September 10, 2019 -  In progress In progress- Firefighting improvements are  In progress and database automation is In progress and database automation is Done Done
- Move all application server (for PHP7) is at about 33% and should get to 50% later this week
- Support migration of services for deployment pipeline is  In progress and might move into next quarter In progress and might move into next quarter
- Strengthen backup infrastructure and support is now  In progress In progress
- additional notes above (in line)
 
- Firefighting improvements are 
- September 2019 -  To do To do
Technical Engagement
Team Manager: Birgit Müller
Core
 Done - HA for OpenStack API endpoints (keystone, glance, nova, designate) Done - HA for OpenStack API endpoints (keystone, glance, nova, designate)
 Partially done - OpenStack version upgrade(s) – tbc in Q2 Partially done - OpenStack version upgrade(s) – tbc in Q2
 Partially done - Jessie deprecation (infra + Cloud VPS) – tbc in Q2 Partially done - Jessie deprecation (infra + Cloud VPS) – tbc in Q2
 Stalled - Ceph cluster POC Stalled - Ceph cluster POC
 In progress - Improve Cloud VPS documentation (for users) – tbc in Q2 In progress - Improve Cloud VPS documentation (for users) – tbc in Q2
 In progress - Toolforge Kubernetes redesign/upgrade In progress - Toolforge Kubernetes redesign/upgrade
 In progress - Improve Toolforge documentation – tbc in Q2 In progress - Improve Toolforge documentation – tbc in Q2
Increased visibility & knowledge of technical contributions, services and consumers across the Wikimedia ecosystem (Reduce Complexity of the Platform, Movement Diversity)
 In progress - Continue Tech Talks In progress - Continue Tech Talks
 Done - Conduct Coolest Tool Award Done - Conduct Coolest Tool Award
 In progress - Publish Technical Contributors Map In progress - Publish Technical Contributors Map
 To do - Blog posts on Small Wiki Toolkits & Coolest Tool Award To do - Blog posts on Small Wiki Toolkits & Coolest Tool Award
 Done - Design & publish Tech Engagement quarterly newsletter Done - Design & publish Tech Engagement quarterly newsletter
 In progress - Develop visualization tool for WMCS edit data – tbc in Q2 In progress - Develop visualization tool for WMCS edit data – tbc in Q2
 To do - Publish Developer Metrics To do - Publish Developer Metrics
Support Wikimedia's diverse technical communities (Reduce Complexity of the Platform; Movement Diversity)
 In progress - Develop support formats: Coordinate Small Wiki Toolkits focus area; Create toolkits & experiment, evaluate In progress - Develop support formats: Coordinate Small Wiki Toolkits focus area; Create toolkits & experiment, evaluate
 Partially done - Technical internships and mentoring: Mentor students in GSOD, GSOC, Outreachy Partially done - Technical internships and mentoring: Mentor students in GSOD, GSOC, Outreachy
- Always  In progress - Provide continuous bug management support in Phabricator In progress - Provide continuous bug management support in Phabricator
Dependencies for core work is on: SRE/Data Center Operations team
Status
- July 23, 2019 -  In progress as marked above In progress as marked above
- August 23, 2019
- HA for OpenStack API endpoints - we'll have a good plan going forward after the offsite and should be able finish everything up by end of quarter.
- Jessie deprecation (talking with the community) will get started in the next week or so, still in a  To do status for now and will continue into Q2 as the conversation continues. To do status for now and will continue into Q2 as the conversation continues.
- Ceph cluster - currently waiting on hardware to be installed, stalled for now
- Toolforge Kubernetes redesign/upgrade is still  In progress (puppetization is in place, and now working on customizations) but will extend into Q2 due to conferences, etc In progress (puppetization is in place, and now working on customizations) but will extend into Q2 due to conferences, etc
- We've also been onboarding a new team member this quarter.
 
- September 2019
- Completed HA for OpenStack API endpoints. Glance is active/passive rather than active/active for now due to lack of good shared storage option. Will revisit after Ceph cluster project is complete.
- OpenStack upgrades from Mitaka to Newton are partially complete. Expect to finish by mid-October.
- Jessie deprecation goal for Q1 of notifying community of project and creating a tracking dashboard is complete. Project continues in Q2.
 

