Skip to content

"Trends and Information on AI, Big Data, Data Science, New Data Management Technologies, and Innovation."

This is the Industry Watch blog. To see the complete ODBMS.org
website with useful articles, downloads and industry information, please click here.

Apr 7 21

On the new Tortoise Global AI Index. Interview with Alexandra Mousavizadeh.

by Roberto V. Zicari

“I think the conversation is really about a shift of where funding is coming from. Governments are spending far less than the big tech platforms. What this tells us about who owns the direction of travel of AI is fascinating. Are we now in a position where the power that the public sector was able to deploy in the past is massively outgunned by private companies and their R&D budgets?” — Alexandra Mousavizadeh.

This is my follow up interview with Alexandra Mousavizadeh,  Partner at Tortoise Media. We talked about the new version of the Global AI Index.

RVZ

Q1. In December 2020, Tortoise Media launched the Global AI Index to benchmark nations on their level of investment, innovation and implementation of artificial intelligence. Since then, your team has been working on expanding the Index. What are the main results of this new Index?

Alexandra Mousavizadeh: The most striking result is China’s rapid improvement. Although the gap between the top three is significant (the US is still streets ahead of China, and the UK lags far behind in third place) across our 143 indicators, China has made gains. That rise is mostly due to a serious boost in research: yet another Chinese university joined the Times’ list of top 100 computer science universities; the total number of citations from high-achieving Chinese computer science academics jumped by 67 per cent over the course of the year; the number of Chinese academic AI papers accepted by the IEEE – a body which sets AI standards and also publishes a number of influential AI journals – now out-do those by US academics by a factor of seven; and China overtook the US in terms of AI patents granted around two years ago and has been pulling further ahead ever since.

China is also pulling ahead in the roll-out of supercomputers, with almost twice as many super computers as the US, demonstrating its growing threat to the US’ AI supremacy.

The UK has slipped in some key metrics, and its lead over its closest competitors has narrowed. Although a new AI strategy has now been announced, it’s not been published yet. We have also seen British slippage across several different key parts of the framework; universities, supercomputing, research, patents and diversity of specialists.

Q2. What has been added to the Global AI Index of this year?

Alexandra Mousavizadeh: We’ve added Armenia, Bahrain, Chile, Colombia, Greece, Slovakia, Slovenia and Vietnam: each of these countries has recently published a strategic approach to artificial intelligence on a national level, and therefore ‘qualified’ for assessment under The Global AI Index framework. That brings the number of countries assessed to 62 overall, up from 54 last year.

We’ve also developed all-new national AI dashboards: policy makers can monitor their national AI activity across all 143 indicators in real-time. These dashboards, the first of their kind, can also simulate the impact of policies via the target-setting feature, which calculates hypothetical ranks and scores based on chosen policy targets.

Q3. What new metrics did you introduce and why?

Alexandra Mousavizadeh: We’ve added a range of new metrics to deepen our measurements across many of the pillars of the index. In Talent, we have incorporated data provided by Coursera, showing the level of enrollment and activity on online learning courses specific to ‘artificial intelligence’ and ‘machine learning’. This data-set fills in gaps in countries like India, China and Russia – where our other metrics were not as comprehensive – by acting as a proxy for the level of online learning taking place.

We’ve also refined some of our existing metrics to increase the accuracy of our data. We’ve replaced the Open Data Barometer data-set, which was outdated in many respects, with measures from the OECD OURdata Index to better reflect the level of open data use and suitability. The new source is much more recent and relevant. We’ve also made our measures of 5G implementation more granular, reflecting the actual level of supported networks in a given country. This is a crucial leading indicator for the capacity to adopt artificial intelligence more widely in a given country,  so this change offers a lot more clarity in the Infrastructure pillar of the index.

In our forthcoming update for the index in May, we’ll be adding more indicators on diversity issues. These will complement a component of the index that deals with regulation and ethics.

Q4. Who are the biggest risers on the Index?

 Alexandra Mousavizadeh: Israel has surged up the rankings from 12th to 5th place. It improved its talent rank, with more R package downloads, an increase in stack overflow questions and answers, and a rise in GitHub commits. These are important indicators because they speak to developments in a country’s coding community beyond the formal education sector. It also still has the highest number of startups as a proportion of the population – 3 AI startups for every 100,000 people (compared with 5 startups for every million people in the US). It’s an impressive feat for a small country, but its rise is driven in part by the number of proportional, or intensity-based, metrics in the index (which favour small countries), and partly by changes in our methodology that more accurately capture the number of developers and other specialists on social media.

Also of note is Finland, which has procured a pre-exascale supercomputer and substantially increased its coding activity. Use of Python, R and commits on GitHub have grown, as well as our measures of GitHubs Stars and Stack Overflow Questions/Answers. This is a notable result of Finland’s ongoing focus on skills development, driven by both government strategy and an excellent ecosystem, including the Finnish AI Accelerator, the Tampere AI Hub and the AI Academy at University of Turku. That complements exciting tech startups, including Rovio, Supercell, and CRF Health.

 The Netherlands is the biggest riser; helped by a slow-down in some of the countries that previously ranked above them (Japan), The Netherlands have accelerated their coding and development activities and have high scores on the proportional Talent measures i.e. Number of Data Scientists per capita.

Q5. How has the pandemic influenced the global development of AI in the world? Do you have any insights to share on this?

Alexandra Mousavizadeh: It is difficult to untangle the exact effect of COVID on the global development of AI – a lot of the data is yet to come in. One area that has definitely suffered is start-ups. This time last year, the UK had 529 AI startups listed on Crunchbase, but it’s now dropped to 338. Other countries have seen similar collapses in startup numbers. There are some counterintuitive results: the 62 countries in our Index attracted similar overall levels of private funding for AI as in 2019. Several countries have seen a fall in investment, but these don’t necessarily match those that have been worst affected by the pandemic. Both the US and China, for instance, have seen a similar drop-off in funding, despite the US having a much worse pandemic overall.

The pandemic has also created challenges to which some countries have responded with AI. Coronavirus has become a focal point for Israel’s AI entrepreneurs, with several Israeli companies emerging as front-runners in areas like diagnostics, disease management and monitoring systems. Vocalis Health, for example, was launched by the defence ministry and is aiming to diagnose Covid effectively based on people’s speaking patterns.

Q6. The lack of transparency is a limiting factor for the effectiveness of the Index. For example in the case of Russia, much of its AI spending may be going to military purposes – and you can’t track it.  What is your take on this?

Alexandra Mousavizadeh: It’s true that there is AI spending and research that we can’t track, especially in countries that are less transparent, like Russia. We only use one proprietary data source for the Index – the Crunchbase API –  and the vast majority of the rest of our information is open source. Government spending on clandestine AI activity represents a small proportion of funding and progress in the AI space. Even if a country is spending quite large sums opaquely on covert AI, that spending is siloed, and often won’t contribute to a country’s overall progress in AI.

Q7. Isn’t this also the case for most of the other counties such as China, but even the USA?  They do not necessarily reveal their use of AI for military purposes.

 Alexandra Mousavizadeh: I think the conversation is really about a shift of where funding is coming from. Governments are spending far less than the big tech platforms. What this tells us about who owns the direction of travel of AI is fascinating. Are we now in a position where the power that the public sector was able to deploy in the past is massively outgunned by private companies and their R&D budgets? Amazon spends roughly ten times as much on R&D and infrastructure as DARPA’s total budget ($36bn and $3.5bn in 2019 respectively). The UK’s new research agency, ARIA, is set to have just £800m over the course of the next parliament.

Another related issue is that of selective publication. It’s well within a company’s rights to not release research that it carries out – but when that research has the potential to create massive public goods, it’s concerning that decisions are made in private by unaccountable tech companies.

Q9. How would you characterize the current geo-politics of artificial intelligence?

Alexandra Mousavizadeh: The countries that get on top of AI will accrue enormous benefits very quickly – not just in efficiency gains and cost reductions, but in transformative technologies that will increase almost every aspect of their global competitiveness. Although a lot of research that we’re currently seeing is business, not state-led, states can move rapidly if they have to.

Conventional wisdom says it’s a competitive environment, with some saying that we’re seeing a rise in AI nationalism. But the different gains and losses made by nations in different factors and metrics on the index show a complex picture of states investing in different areas. The World Economic Forum’s ‘Framework for Developing a National Artificial Intelligence Strategy’ highlights the collaborative role that many governments are aiming to play; co-designing, rather than merely responding to, technological change across multiple sectors. And many of the factors we track don’t stick to one country – talent crosses borders, and Github is global. A lot of the most interesting developments are iterated with open-source technology. So in many ways, the geo-politics of AI can be one of mutual benefit rather than a zero-sum environment.

It’s more complicated than any one story; but one narrative that we’re definitely seeing is two superpowers, the US and China, establishing themselves as dominant in the space. Then there are a number of specialist smaller states that could have a significant role to play in terms of standard-setting, and specialist AI in areas of national comparative advantage – take the UK and medtech for example. Finally, there are countries that want to be involved but aren’t currently in a place where their strategy (or level of investment) matches their ambitions. Those countries need to get serious, quickly.

AI is an accelerant – we run the risk of seeing clusters of AI excellence that exacerbate divides within and between nations, compounding existing inequalities and leaving those without skills and capital behind.

Q10. Carissa Veliz mentioned that “to ensure ethical behaviour around AI, certain behaviours should be banned by law” . Do you plan to include policy regulations in the future version of the Index?

Alexandra Mousavizadeh: The dashboards we developed have a set of implicit policy recommendations sitting behind the indicators; governments can see how their rankings would improve by using these features. But with regard to ethics, we spent hundreds of hours with the team discussing how regulation and ethics could feed into the index and concluded that it needed an in depth examination which warranted its own investigation. We are now doing that work. The May update to the index will contain a section on policy regulation, so stay tuned.

Having said this, there are some metrics in the Index which do address ethical considerations – such as the diversity of researchers in STEM subjects. This section will also be expanded in May.

Qx Anything else you wish to add?

Alexandra Mousavizadeh: We love hearing from our readers at Tortoise; the point is that we include them as part of the conversation. Our AI Sensemaker newsletter contains our and their thinking, keeping you posted on the latest developments in AI, and comes out once a fortnight. You can sign up here.  To get more involved, you can also apply to join our AI Network: it’s a global community of experts, policy makers, and business leaders, who take part in monthly round tables. These cutting-edge conversations set the pace for all things AI.

————————-

Alexandra

Alexandra Mousavizadeh is a Partner at Tortoise Media, running the Intelligence team which develops indices and data analytics. Creator of the recently released Responsibility100 Index and the new Global AI Index. She has 20 years’ experience in the ratings and index business and has worked extensively across the Middle East and Africa. Previously, she directed the expansion of the Legatum Institute’s flagship publication, The Prosperity Index, and all its bespoke metrics based analysis & policy design for governments. Prior roles include CEO of ARC Ratings, a global emerging markets based ratings agency; Sovereign Analyst for Moody’s covering Africa; and head of Country Risk Management, EMEA, Morgan Stanley.

Resources

The Global AI Index

The Global AI Index Methodology Report

The Tortoise Global AI Summit: the Readout. Thursday 10 December 2020

Related Posts

On The Global AI Index. Interview with Alexandra Mousavizadeh. ODBMS Industry Watch.by Roberto V. Zicari on January 18, 2020

Follow us on Twitter: @odbmsorg

##

Mar 18 21

On the Challenges Facing Financial Institutions. Interview with Joe Lichtenberg

by Roberto V. Zicari

“There are three factors C-suite executives need to consider when addressing operational resilience: the need to make better business decisions faster; improved automation and the elimination of manual processes; and the ability to respond to unexpected volume and valuation volatility.” –Joe Lichtenberg.

I have interviewed Joe Lichtenberg, responsible for product and industry marketing for data platform software at InterSystems.

RVZ

Q1. What are the main challenges financial institutions are facing right now?

Joe Lichtenberg: As financial services organizations are pushed to rapidly adapt due to the pandemic, they also want to gain a competitive edge, deliver more value to customers, reduce risk, and respond more quickly to the needs of distributed businesses. To not only stand out, but ultimately survive, financial services organizations have relied on their digital capabilities. For instance, many have adapted faster than anticipated and found ways to supplement traditional face-to-face customer service. As the volume of complex data grows and the need to use data for decision-making accelerates, it is becoming more difficult to reach their business goals and deliver differentiated service to customers at a faster rate.

Q2. What do you suggest to C-suite executives that could help them re-evaluate their operational resilience in light of increasing volumes and volatility, and the shift to a remote working environment (especially due to the COVID-19 crisis)?

Joe Lichtenberg: There are three factors C-suite executives need to consider when addressing operational resilience: the need to make better business decisions faster; improved automation and the elimination of manual processes; and the ability to respond to unexpected volume and valuation volatility. Executives need to prioritize their organizations’ ability to access and process a single representation of accurate, consistent, real-time and trusted data. The volatility and uncertainty fueled by the pandemic pushed organizations to rely on the vast amounts of data available to them to properly bolster resilience and adaptability.

From scenario planning to modeling enterprise risk and liquidity, regulatory compliance, and wealth management, access to accurate and current data can enable organizations to make smarter business decisions faster. Organizations need to streamline and accelerate operations by eliminating manual processes where possible and automating processes. Not only will this help increase speed and agility, but it will also reduce the delays and errors associated with manual processes. Finally, executives must look to ensure they have sufficient headroom, processing capabilities, and systems in place to foster agility and reliability and to respond to unexpected volatility.

Q3. What are the key challenges they face to keep pace with the ongoing market dynamics?

Joe Lichtenberg: The more data sources organizations have, the more complex their practices become. As data grows, so does the prevalence of data silos, making access to a single, trusted, and usable representation of the data challenging. Additionally, analytics are more difficult to perform with disorganized data, causing results to be less accurate, especially in regards to visibility, decision support, risk, compliance, and reporting. This issue is extremely important as organizations perform advanced analytics (e.g. machine learning), where access to large sets of clean, healthy data is required in order to build models that deliver accurate results.

Q4. Do you believe that capital markets are paying the price for delaying core investments into their data architectures?

Joe Lichtenberg: Established capital markets organizations have delayed some of their investments in data architectures for a variety of reasons, and that move has ultimately kept operational costs in check, as large changes could drastically disrupt workflows and set them back further in the short term. These organizations typically have well-established core infrastructures in place that have served them well. Over the years, they have been expanded on, which means introducing significant changes is a complicated and complex process. However, the combination of unprecedented volatility, rising customer expectations, and competition from niche financial technology companies – that are providing new services – are straining the limits of these systems and pushing established firms to modernize faster, using microservices, APIs, and AI. In fact, in some cases financial organizations are outsourcing non-core capabilities to FinTechs. The FinTechs, not burdened by legacy infrastructure, are able to innovate quickly but may not have the breadth, depth, or resilience of the established firms. As capital markets firms modernize their data architecture, replacing these systems can lead to greater downtime that can slow and stall modernization efforts. Implementing a data fabric enables organizations to modernize without costly rip-and-replace methods and empowers them to address siloed legacy applications while existing systems remain in place.

Q5. What is a “data fabric”?

Joe Lichtenberg: A data fabric is a reference architecture that provides the capabilities required to discover, connect, integrate, transform, analyze, manage, and utilize enterprise data assets. It enables the business to meet its myriad of business goals faster and with less complexity than legacy technologies. It connects disparate data and applications, including on-premises, from partners, and in the public cloud. An enterprise data fabric combines several data management technologies, including database management, data integration, data transformation, pipelining, API management, etc.

A data fabric addresses many of the limitations of data warehouses and data lakes and brings on a new wave of redesign to the modern data architecture to create a more dynamic system according to Gartner. A smart data fabric extends the capabilities to include a wide array of analytics capabilities – eliminating the complexity and delays associated with traditional approaches like data lakes that require moving data to yet another environment.

Q6. How can firms eliminate the friction that has been built up around accessing information, reduce the cost and complexity of data wrangling?

Joe Lichtenberg: Building a smart enterprise data fabric as a data layer to serve the organization enables firms to reduce complexity, speed development, accelerate time to value, simplify maintenance and operations, and lower total cost of ownership. Additionally, it enables organizations to execute analytics and programmatic actions on demand, by utilizing clean and current data that resides within the organization

Q7. What are your recommendations for firms that need to work with competitive insights?

Joe Lichtenberg: Competitive insights require access to accurate and current data that may reside in different silos in order to get maximum value. A data fabric provides the necessary access to this required data, but a smart data fabric takes this a step further. It incorporates a wide range of analytics capabilities, including data exploration, business intelligence, natural language processing, and machine learning, that enable organizations to visaulize, drill into and explore, and combine the data from different sources. This helps not just skilled developers, data stewards and analysts, but a wide range of users that are close to the business to gain new insights to guide business decisions and create intelligent prescriptive services and applications.

Q8. Where do you see Artificial intelligence and machine learning technologies playing a role for financial institutions?

Joe Lichtenberg: Advanced analytics are essential to the future success of financial institutions. AI, ML, and natural language processing (NLP) tools are already being utilized in various areas of financial services. Although some may argue that niche FinTechs lead the way in the adoption of these tools, established organizations are also utilizing AI and machine learning to increase wallet share, enhance customer engagement, and guide strategic decisions. However, these tools are only as effective as the data that powers them. Without healthy data, they can’t deliver accurate results. That is why it’s essential to place an emphasis on the quality of data that is collected and fed into these powerful tools.

Q9. The next generation of technology advancement must be built on strong data foundations. Artificial intelligence and machine learning require a high volume of current, clean, normalised data from across the relevant silos of a business to functions. How is it possible to deliver this data without requiring an entire structural rebuild of every enterprise data store?

Joe Lichtenberg: This is a common initiative for which a smart, enterprise data fabric is being used. But implementing such a reference architecture can be complex, requiring implementing and integrating many different data management technologies. A modern data platform that combines multiple layers and capabilities in a single product, reducing complexity by minimizing the number of products and technologies required, are helping to deliver critical business benefits with a simpler architecture, faster time to value, and lower total cost of ownership. For example, modern data platforms combine horizontally scalable, transactional and analytic database management capabilities, data and application integration functionality, data persistence, API management, analytics, machine learning, and business intelligence in a single product built from the ground up on a common architecture. Not only can the implementation of a smart enterprise data fabric with a modern data platform at the core help firms address current pain points, it accelerates the move toward a digital future without the costly rip-and-replace of their current operational infrastructure.

———————

Unknown-1

Joe Lichtenberg is responsible for product and industry marketing for data platform software at InterSystems. Joe has decades of experience working with various data management, analytics, and cloud computing technology providers.

Resources

Five Key Reasons to Invest in a Smart Data Fabric. InterSystems.

–  Accelerate Your Enterprise Data Initiatives with a Smart Data Fabric. InterSystems. (Download .PDF Link Registration required)

Related Posts

– On AI for Insurance and Risk Management. Interview with Sastry Durvasula. by Roberto V. Zicari on February 13, 2020

Follow us on Twitter: @odbmsorg

Nov 23 20

On Digital Transformation and Ethics. Interview with Eberhard Schnebel

by Roberto V. Zicari

” Whether an organization is transparent and fair is, first of all, a very emotional category. This is something that customers and strategic partners decide by their own judgment and feelings.
The emotional communication of these values is the central task. 
However, there must also be minimum standards. It must be possible to check whether digital processes are fair and transparent. An inspection can be carried out for these minimum standards either as certification or as an onsite inspection. This depends on the complexity of the systems.” — Eberhard Schnebel.

I have interviewed Eberhard Schnebel, Group Risk Management at Commerzbank AG, where he led the project “Data Ethics / Digital Ethics” to establish aspects of corporate ethics regarding digitalization, Big Data and AI in the financial industry. We talked about Digital Transformation and Ethics.

Stay safe!

RVZ

Q1. What is Digital Ethics?

Eberhard Schnebel: Digital Ethics is the communication of applying ethical or social ideas to digital technology and making them a reality with digital technology. In the past, this was done in the context of Technology Assessment. The social and technological aspects are strictly separated, and digital technology’s instrumental use is systematically assessed from an ethical perspective.
Today’s approach sees digital development as an integrated part of our social life. It is important to transfer our social ideas into this digital reality and understand which digital contexts we would like to change.
This new approach understands Digital Ethics as part of the digital transformation itself. Digital Ethics thus directly influences our life with digital technology.

Q2. Why should Data Ethics and Digital Ethics become a core element of a company’s digital strategy?

Eberhard Schnebel: Data, BigData, and digital technologies are changing the core elements of many companies’ business models. New communication and risk management tasks are emerging. These must be taken into account in many areas and professionally integrated into the organization.
Data Ethics and Digital Ethics create exactly this extension of organizational routines, which will contribute to the company’s success in the future. They create awareness of these elements’ new reality within the company – from marketing and customer communication to product design and data scientists.

Q3. In Europe, there are many legal precautions and increasing rules on how to adopt the issues of digital transformation and data ethics, e.g., EU-GDPR, EU-Trustworthy-AI, the recommendations of the Data Ethics Commission of the German government. What does this mean for an organization?

Eberhard Schnebel: Dealing with these developments and regulations is very important, given the rapid development of data technology and the rapid new possibilities for building up business models. There are always many grey areas where technology needs completely new solutions or where regulations have become blurred. Therefore, organizations must go far beyond a pure compliance system and consider integrating the requirements into a system as soon as they design it. In addition to pure “legal compliance”, organizations need an ethical policy that defines how employees must deal with fuzzy requirements. Such a policy helps to avoid various risks that would result from software development.
This is about creating the organizational conditions for living these ideas and increased awareness and readiness for social concerns in concrete terms.

Q4 How is it possible for companies to establish Data Ethics and Digital Ethics as an essential factor in their business model?

Eberhard Schnebel: Data Ethics and Digital Ethics define what needs to be done and what does not. On the other hand, they also define how products are set up and designed, and used. This is important to be able to sell everything to customers later in a way that satisfies them. When everyone – customers, employees, and management – has the same understanding of the company’s digital products, services, and tools.
Besides, the company must also make central organizational preparations, such as setting up a governance board or creating a framework for managing ethical risks.

Q5. What are the key challenges to achieving this in practice?

Eberhard Schnebel: Everyone who comes into contact with data or AI products should understand the framework of regulations applicable to a company’s data and AI ethics. We create a culture in which a data and AI ethics strategy can be successfully implemented and maintained. Therefore, it is necessary to educate and train employees, enabling them to raise important issues at key points and bring the main concerns to the appropriate advisory body.

Q6. What does “transparency” in the application practice mean for an organization?

Eberhard Schnebel: In an organization or company, “transparency” is very much about communication and less about the actual facts. Transparency is anything where others believe that they can sufficiently understand and comprehend your reasons and background. The creation of an emotional connection is the central point here. At the same time, it must be clear to the Governance Board and product designers and data scientists what it means to make this transparency factual.

Q7. What does “fairness” mean in the daily routine for an organization?

Eberhard Schnebel: Fairness is the most challenging term in Digital Ethics. Because we are seeing more and more transparently the tensions between the individual elements of digitization created by transparency, this must be offset by conveying fairness.
But fairness is also when new information asymmetries between companies and customers are used so that they always lead to an advantage for the customer. This advantage must, of course, be experienced by them exactly as such, which in turn requires very emotional communication. The tension between transparency and fairness, in turn, is a real ethical debate (see Nuffield Report).

Q8. Who will review whether an organization is transparent and fair?

Eberhard Schnebel: Whether an organization is transparent and fair is, first of all, a very emotional category. This is something that customers and strategic partners decide by their own judgment and feelings. The emotional communication of these values is the central task.
However, there must also be minimum standards. It must be possible to check whether digital processes are fair and transparent. An inspection can be carried out for these minimum standards either as certification or as an onsite inspection. This depends on the complexity of the systems.

Q9 Hardly any other technology is as cross-product and cross-industry as Artificial Intelligence (AI). Is it possible to achieve responsible use of AI?

Eberhard Schnebel: Yes, I think so, but this can only happen from within and not as a “regulation”. Just as the “honorable merchant” is supposed to lead to ensuring responsible use of the economy, an inner understanding of digitization can also emerge.
To achieve this, we must ensure that this technology’s initial fascination, which still embraces us, is transformed. It must give way to a more sober view of what can be expected and what it will be used for. We must find quality criteria that describe its “goodness”.
But we also need visible and comprehensible analyses of the systems, including discussing the ethical tasks involved. Here, a new system of ethical screening can provide the insights needed for societal evaluation and discussion.

Q10. In your new book (*), you talk about “Management by Digital Excellence Ethics”. What do you mean by this?

Eberhard Schnebel: In the end, three building blocks must come together in Digital Ethics:
1. Digital Ethics adjusted compliance system that ensures that minimum standards are met, and risks are avoided.
2. Analytical module that ensures the necessary transparency and flow of ethical information for system design.
3. Commitment of management and product designers to communicate this ethical content to customers and business partners.
Suppose the link can be made between digital technologies’ quality use, appropriate meaning, and communicating their benefits. In that case, this is Digital Excellence Ethics because it successfully translates the excellent use of digital technology to integrate social ideas into business models.

Q11 Do you have anything to add?

Eberhard Schnebel: Again, we want to encourage the connection between ethical instruments and ethical risk management to be communicative and not just organizational with this book. Those involved must be emotionally involved. If we want to find Digital Ethics “on the engine” in the end, then only as an inner ethical element, as mindfulness.

———————————————————–

Eberhard_Schnebel-35-a-705x705
Dr. habil. Eberhard Schnebel
Ethical Advisor. Frankfurt Big Data Lab, Goethe University Frankfurt, Germany.

Eberhard Schnebel is a philosopher, economist and theologian. He received his PhD in 1995 from the Faculty of Theology of the LMU in Munich and his habilitation in 2013 at the Faculty of Philosophy of the LMU. His work combines Theory of Action, System-Theory and Ethics as foundations for responsibility in business and management.
Eberhard Schnebel is teaching Business Ethics at Goethe University in Frankfurt since 2013. Since 2016 he is developing “Digital Ethics” to integrate ethical communication structures into design processes of digitization and Artificial Intelligence.

He is a member of the Executive Committee of the “European Business Ethics Network” (EBEN). This network is engaged in the establishment of a common European understanding of ethics as prerequisite for a converging European economy. He is also member of the research team on Z-Inspection®, working on assessing Trustworthy AI in practice.

Eberhard Schnebel works full-time at Commerzbank AG, Group Risk Management, where he led the project “Data Ethics / Digital Ethics” to establish aspects of corporate ethics regarding digitalization, Big Data and AI in the financial industry. Prior to that, he led the project “Business Ethics and Finance Ethics” to introduce ethics as a tool for increasing management efficiency and accountability.

Resources
(*) A Valuable Future. Digital Ethics. by Dr. Eberhard Schnebel and Thomas Szabo. digital excellence, July 2020

Ethical Implications of AI, Series of Lectures— Open to all- no fees. Videos, Slides, Reports/Papers classified by topics.

Z-Inspection®: A process to assess Trustworthy AI.

Related Posts

– On Digital Transformation, Big Data, Advanced Analytics, AI for the Financial Sector. Interview with Kerem Tomak. ODBMS Industry Watch, July 8, 2019.

Follow us on Twitter: odbmsorg

Nov 6 20

Quantum Computer Systems. Interview with Fred Chong and Yongshan Ding

by Roberto V. Zicari

” Quantum computing is incredibly exciting because it is the only technology that we know of that could fundamentally change what is practically computable, and this could soon change the foundations of chemistry, materials science, biology, medicine, and agriculture.” — Fred Chong.

I have interviewed Fred ChongSeymour Goodman Professor in the Department of Computer Science at the University of Chicago and Yongshan Ding,  PhD candidate in the Department of Computer Science at the University of Chicago, advised by Fred Chong. They just published an interesting book on Quantum Computer Systems.

RVZ

Q1. What is quantum computing?

Ding: Quantum computing is a model of computation that exploits the unusual behavior of quantum mechanical systems (e.g., particles at minimal energy and distance scales) for storing and manipulating information. Such quantum systems have significant computational potential, as they allow a quantum computer to operate on an exponentially large computational space, offering efficient solutions to problems that seem to be intractable in the classical computing paradigm.

Q2. What are the potential benefits of this new paradigm of computing?

Chong:  Quantum computing is incredibly exciting because it is the only technology that we know of that could fundamentally change what is practically computable, and this could soon change the foundations of chemistry, materials science, biology, medicine, and agriculture.

Quantum computing is the only technology in which every device that we add to a machine doubles the potential computing power of the machine. If we can overcome the challenges in developing practical algorithms, software, and machines, quantum computing could solve some problems where computation grows too quickly (exponentially in the size of the input) for classical machines.

In the short term, quantum computing will change our understanding of the aforementioned sciences that fundamentally rely on understanding the behavior of electrons. A classical computer uses an exponential number of bits (electrons) to model the positions of electrons and how they change. Obviously, nature only uses one electron to “model” each electron in a molecule. Quantum computers will use only a small (constant) number of electrons to model molecules

Q3. What is the current status of research in quantum computing?

Ding: It is no doubt an exciting time for quantum computing. Research institutions and technology companies worldwide are racing toward practical-scale, fully programmable quantum computers. Many others, although not building prototypes by themselves, are joining the force by investing in the field of quantum computing. Just last year, Google used their Sycamore prototype to demonstrate a first “quantum supremacy” experiment, performing a 200-second computation that would otherwise take days on a classical supercomputer[1],[2]. We have entered, according to John Preskill, the long-time leader in QC, a “noisy intermediate-scale quantum” (NISQ) technology era, in which non-error-corrected systems are used to implement quantum simulations and algorithms. In our book, we discuss several of the recent advances in NISQ algorithm implementations, software/hardware interface, and qubit technologies, and highlight what roles computer scientists and engineers can play to enable practical-scale quantum computing.

Q4. What are the key principles of quantum theory that have direct implications for quantum computing?

Chong:  Paraphrasing our research teammate, Peter Shor, quantum computing derives its advantage from the combination of three properties:  an exponentially-large state space, superposition, and interference.  Each quantum bit you add to a system increases the state of the machine by 2X, resulting in exponential growth with the number of qubits.  These states can exist simultaneously in superposition and manipulating these states creates interference in these states, resulting in patterns that can be used to solve problems such as the factoring of large numbers with Shor’s algorithm.

Q5. What are the challenges in designing practical quantum programs?

Chong: There are only a small number of quantum algorithms that have an advantage over classical computation.  A practical quantum program must use these algorithms as kernels to solve a practical problem.  Moreover, quantum computers can only take a small number of bits as input and return a small number of bits as output.  This input-output limitation constrains the kinds of problems we can solve.  Having said that, there are still some large classes of problems that may be solved by quantum programs, such as problems in optimization, learning, chemistry and physical simulation.

Q6. What are the main challenges in building a scalable software systems stack for quantum computing?

Ding: A quantum computer implements a fundamentally different model of computation than a modern classical computer does. It would be surprising if the exact design of a computer architecture would extend well for a quantum computer. The architecture of a quantum computer resembles a classical computer in the 1950s, where device constraints are so high that the full-stack sharing of information is required from algorithms to devices. Some example constraints include hardware noises, qubit connectivity/communication, no copying of data, and probabilistic outcomes. In the near-term, due to these constraints, it is challenging for a quantum computer to follow the modularity and layering models, as seen in classical architectures.

Q7. What kind of hardware is required to run quantum programs?

Chong:  There are actually several technologies that may run practical quantum programs in the future.The leading ones right now are superconducting devices and trapped ions.  Optical devices and neutral atoms are also promising in the future.  Majorana devices are a further future alternative that, if realized, could have substantially better reliability than our current options.

Q8. Quantum computers are notoriously susceptible to making errors (*). Is it possible to mitigate this?

Ding: Research into the error mitigation and correction of quantum computation has produced several promising approaches and motivated inter-disciplinary collaboration – on the device side, researchers have learned techniques such as dynamical decoupling and its generalizations, introducing additional pulses into the systems that filter noise from a signal; on the systems software side, a compiler tool-flow that is aware of device noise and application structure can significantly improve the program success rate; on the application side, algorithms tailored for a near-term device can circumvent noise by a classical-quantum hybrid approach of information processing. In the long term, when qubits are more abundant, the theory of quantum error correction allows computation to proceed fault tolerantly by encoding quantum state such that errors can be detected and corrected, similar to the design of classical error-correcting codes.

Q9. Some experts say that we’ll never need quantum computing for everyday applications. What is your take on this?

Chong:  It is true that quantum computing will likely take the form of hardware accelerators for specialized applications.  Yet some of these applications include general optimization problems that may be helpful for many everyday applications.  Finance, logistics, and smart grid are examples of applications that may touch many users every day.

Q10. There is much speculation regarding the cybersecurity threats of quantum computing (**). What is your take on this?

Chong: In the long term, quantum machines may pose a serious challenge to the basis of modern cryptography. Digital commerce relies upon public-key cryptography systems that use a “pseudo-one-way function.” For example, it should be easy to digitally sign a document, but it should be hard to reverse-engineer the secret credentials of the person signing from the signature. The current most practical implementation of a one-way function is to multiply two large prime numbers together. It is hard to find the two primes from their product. All known classical algorithms take time exponential in the number of bits in the product (RSA key). This is the basis of RSA cryptography. Large quantum computers (running Shor’s algorithm) could find these primes in n^3 time for an n-bit key.

This quantum capability would force us to re-invent our cryptosystems to use other means of encryption that are both secure and inexpensive. An entire field of “post-quantum cryptography” has grown to address this problem. Secure solutions exist that resist quantum attacks, but finding any that are as simple as the product of two primes has been challenging. On the positive side, quantum computers and quantum networks may also help with security, providing new means of encrypting and securely communicating data.

Q10. What is the vision ahead for quantum computing?

Chong: Practical quantum computation may be achievable in the next few years, but applications will need to be error tolerant and make the best use of a relatively small number of quantum bits and operations.   Compilation tools will play a critical role in achieving these goals, but they will have to break traditional abstractions and be customized for machine and device characteristics in a manner never before seen in classical computing.

Q11. How does this book differ from other Quantum Computing Books? 

Ding: This is the first book to highlight research challenges and opportunities in near-term quantum computer architectures. In doing so, we develop the new discipline of “quantum computing systems”, which adapts classical techniques to address the practical issues at the hardware/software interface of quantum systems. As such, this book can be used as an introductory guide to quantum computing for computer scientists and engineers, spanning a range of topics across the systems stack, including quantum programming, compiler optimizations, noise mitigation, and simulation of quantum computation.

 ———————————

Professor Fred Chong, April 3, 2019 at Crerar. (Photo by Jean Lachat)

Professor Fred Chong, April 3, 2019 at Crerar. (Photo by Jean Lachat)

Fred Chong is the Seymour Goodman Professor in the Department of Computer Science at the University of Chicago. He is also Lead Principal Investigator for the EPiQC Project (Enabling Practical-scale Quantum Computing), an NSF Expedition in Computing. Chong received his Ph.D. from MIT in 1996 and was a faculty member and Chancellor’s fellow at UC Davis from 1997-2005. He was also a Professor of Computer Science, Director of Computer Engineering, and Director of the Greenscale Center for Energy-Efficient Computing at UCSB from 2005-2015. He is a recipient of the NSF CAREER award, the Intel Outstanding Researcher Award, and 9 best paper awards. His research interests include emerging technologies for computing, quantum computing, multicore and embedded architectures, computer security, and sustainable computing. Prof. Chong has been funded by NSF, DOE, Intel, Google, AFOSR, IARPA, DARPA, Mitsubishi, Altera and Xilinx. He has led or co-led over $40M in awarded research, and been co-PI on an additional $41M.  

 yongshan_ding

Yongshan Ding is a PhD candidate in the Department of Computer Science at the University of Chicago, advised by Fred Chong. Before UChicago, he received his dual B.Sc. degrees in Computer Science and Physics from Carnegie Mellon University. His research interests are in the areas of computer architecture and algorithms, particularly in the context of quantum computing. His work spans broadly in the theory and application of quantum error correction, efficient and reliable quantum memory management, and optimizations at the hardware/software interface. 

Resources

9781681738666

– Quantum Computer Systems. Research for Noisy Intermediate-Scale Quantum Computers. Yongshan Ding, University of Chicago, Frederic T. Chong, University of Chicago, Morgan & Claypool Publishers, ISBN: 9781681738666 |  2020 | 227 Pages, Link to Web Site


Related Posts
– On using AI and Data Analytics in Pharmaceutical Research. Interview with Bryn Roberts. ODBMS Industry Watch. September 10, 2018

Quote from Industry:  

” What I’m particularly excited about just now is the potential of universal quantum computing (QC). Progress made over the last couple of years gives us more confidence that a fault-tolerant universal quantum computer could become a reality, at a useful scale, in the coming years. We’ve begun to invest time, and explore collaborations, in this field. Initially, we want to understand where and how we could apply QC to yield meaningful value in our space. Quantum mechanics and molecular dynamics simulation are obvious targets, however, there are other potential applications in areas such as Machine Learning. I guess the big impacts for us will follow “quantum inimitability” (to borrow a term from Simon Benjamin from Oxford) in our use-cases, possibly in the 5-15 year timeframe, so this is a rather longer-term endeavour.” — Dr. Bryn Roberts, Global Head of Operations for Roche Pharmaceutical Research & Early Development.

Follow us on Twitter @odbmsorg
Jul 23 20

Thirty Years C++. Interview with Bjarne Stroustrup

by Roberto V. Zicari

“If you keep your good ideas to yourself, they are useless; you could just as well have been doing crossword puzzles. Only by articulating your ideas and making them accessible through writing and talks do they become a contribution.” –Bjarne Stroustrup

Back in 2007 I had the pleasure to interview Bjarne Stroustrup, the inventor of  C++ programming language. Thirteen years later…, I still have the pleasure to publish an interview with Bjarne.

RVZ

Q1. You have learned the fundamentals of object-oriented programming from Kristen Nygaard co-inventor of the Simula object-oriented programming (together with Ole-Johan Dahl back in the 1960s) who often visited your university in Denmark. How did Kristen Nygaard influence your career path?

Bjarne Stroustrup:  Kristen was a very interesting and impressive character. He was, of course, highly creative, and also a giant in every way. For starters he was about 6’6” and apparently quite as wide. When so inspired, he could deliver crushing bear hugs. Having a discussion with him on any topic – say programming, crime fiction, or labor policies – was invariably interesting, sometime inspiring.

As a young Masters student, I met him often because my student office was at the bottom of the stairs leading to the guest apartment. Each month he’d come down from Oslo for a week or so. Upon arrival, he’d call to me (paraphrasing) “round up the usual suspects” and my job was then to deliver half-a-dozen good students and a crate of beer. We then talked – meaning that Kristen poured out information on a variety of topics – for a couple of hours. I learned a lot about design from that, and the basics of object-oriented programming; the Scandinavian school of OOP, of course, where design and modeling the real world in code play major roles.

Q2. In 1979, you received a PhD in computer science from the University of Cambridge, under the supervision of  David Wheeler. What did you learn from David Wheeler that was useful for your future work?

Bjarne Stroustrup:  David Wheeler was very much in a class of his own. His problem-solving skills and design abilities were legendary. His teaching style was interesting. Each week, I came to his office to tell him what great ideas I had had or encountered during the week. His response was predicable along the lines of “yes, Bjarne, that’s not a bad idea; in fact, we almost used that for the EDSAC-2.” That is, he’d had that idea about the time I entered primary school and rejected it in favor of something better. I hear that some students found that style of response hard to deal with, but I was fascinated because David then proceeded to clarify my original ideas, evaluate them in context and elaborate on their strengths, weaknesses, possible improvements, and alternatives. With a few follow-up questions from me, he’d continue discussing problems, solutions, and tradeoffs for an hour or more. He taught me a lot about how to explore design spaces and also how to explain ideas – always based on concrete examples. I found his formal lectures terminally boring, and I don’t think he liked giving them, his strengths were elsewhere.

On my first day in Cambridge, he asked me “what’s the difference between a Masters and a PhD?” I didn’t know. “If I have to tell you what to do, it’s a Masters” he said and proceeded to – ever so politely – indicate that a Cambridge Masters was a fate worse than death. I didn’t mind because – as he had probably forgotten – I had just completed a perfectly good Masters in Mathematics with Computer Science from the University of Aarhus. In the years he supervised me, I don’t think he gave me more than one single direct advice. On my last day before leaving Cambridge after completing my thesis, he took me out to lunch and said “you are going to Bell Labs; that’s a very good place with many excellent people, but it’s also a bit of a black hole: good people go in and are never heard from again, whatever you do, keep a high external profile.” That fitted perfectly with my view that if you keep your good ideas to yourself, they are useless; you could just as well have been doing crossword puzzles. Only by articulating your ideas and making them accessible through writing and talks do they become a contribution.

David had a great track record with both hardware and software. That appealed to me.

Both David Wheeler and Kristen Nygaard honest, kind, and generous people – people you could trust and who worked hard for what they believed to be important.

Q3. You are quoted saying, that you designed C++ back in 1979 to answer to the question “How do you directly manipulate hardware and also support efficient high-level abstraction?” Do you still believe this was a good idea? 

Bjarne Stroustrup:  Definitely! Dennis Ritchie famously distinguished between languages designed “to solve a problem” and languages designed “to prove a point.” Like C, C++ is of the former category. The borderline between software and hardware is interesting, challenging, constantly changing, and constantly increasing in importance. The fundamental idea of C++ was to provide support for direct access to hardware, based on C’s model and then to allow people to “escape” to higher levels of expression through (what became known as) zero-overhead abstraction. There seems to be a never-ending need for code in that design space. I started with C plus Simula-like classes and over the years improvements (such as templates) have greatly increased C++’s expressive power and optimizability.

Q4. Why did you choose C as a base for your work? 

Bjarne Stroustrup: I decided to base my new tool/language one something, rather than start from scratch. I wanted to be part of a technical community and not re-hash all the fundamental design decisions. I knew at least a dozen languages that gave flexibility and good access to hardware facilities that I could have built upon. For example, I was acquainted with Algol 68 and liked its type system, but it didn’t have much of an industrial community.
C’s support for static type checking was weak, but the local support and community was superb: Dennis Ritchie and Brian Kernighan were just across the corridor from me! Also, its way of dealing with hardware was excellent, so I decided base my work on C and add to it as needed, starting with function argument checking and classes with constructors and destructors.

Q5. You also wrote that one way of looking at C++ is as the result of decades of three contradictory demands: Make the language simpler! Add these two essential features now!! Don’t break (any of) my code!!! Can you please explain what do you mean with these demands? 

Bjarne Stroustrup:  Many people have very reasonable wishes for improvement, but often those wishes are contradictory and any good design must involve tradeoffs.

  • Clearly C++ has undesirable complexity and “warts” that we’d like to remove. I am on record saying that it would be possible to build a language 1/10th of the size of C++ (by any measure) without reducing its expressive power or run-time power (HOPL3). That would not be easy, and I don’t think current attempts are likely to succeed, but I consider it possible and desirable.
  • Unfortunately, achieving this reasonable aim would break on the order of half a trillion lines of code, outdate huge amounts of teaching material, and outdate many programmers’ hard-earned experience.
    Many, possibly even most, organizations would still find themselves dependent on C++ for many more years, possibly decades. Automatic and guaranteed correct source-to-source translation could ease the pain, but it is hard to translate from messy code to cleaner code and much crucial C++ code manipulates tricky aspects of hardware.
  • Finally, few people want just simplification. They also want new facilities, that will allow them to cleanly express something is very hard to express in C++. They want novel features that necessarily makes the language bigger.
  • We simply cannot have all we want. That should not make us despondent or paralyzed, though. Progress is possible, but it involves painful compromises and careful design. It is worth remembering that every long-lived and widely used language will contain feature that in retrospect could be seriously improved or replaced with better alternatives. It will also have a large code base that doesn’t live up to modern standards of design and implementation. This is an unavoidable price of success.

Q6. C++ 1979-2020. What are the main lessons you have learned in all these years?

Bjarne Stroustrup:  There are many lessons, so it is hard to pick a main one. I assume you mean programming language design lessons.

Fundamental decisions are important and hard to change. Once in real-world use, basic language decisions cannot be changed. Fashions are seductive and hard to resist, but change over timespans shorter than the lifetime of a language. It is important to be a bit humble and suspicious about one’s own certainties. Often, the first reasonable solution you find isn’t the best in the longer run. Stability over decades is a feature. You don’t know what people are going to use the language for, or how. No one language and no one programming style will serve all users well.

Complete type safety and complete general resource management have been ideals for C++ from the very beginning (1979). However, given the need for generality and uncompromising performance, these were ideals that could be approached only incrementally as our understanding and technology improved. Arbitrary C++ code cannot be guaranteed type- and resource-safe and we cannot modify the language to offer such guarantees without breaking billions of lines of code. However, we have now reached the point where we can guarantee complete type safety and resource safety by using a combination of guidelines, library support, and static analysis: The C++ Core Guidelines. I outlined the principles of the Core Guidelines in a 2015 paper. Currently a static analyzer supporting the Core Guidelines ships with Microsoft Visual Studio. I hope to see support for the Guidelines that is not part of a single implementation so that their use could become universal.

My appreciation of tool support has grown over the years. We don’t write programs in just a programming language, but in a specific tool chain and specific environment made up of libraries and conventions.
The C++ world offers a bewildering variety of tools and libraries. Many are superb, but there is no dominant “unofficial standards” so it is very hard to choose and to collaborate with people who made different choices. I hope for come convergence that would significantly help C++ developers and C++ teaching. My HOPL-4 paper,Thriving in a crowded and changing world: C++ 2006–2020, has a discussion of that.

Q7. Who is still using C++?

Bjarne Stroustrup:  More developers than ever. C++ is the basis of many, many systems and applications, including some of our most widely used and best-known systems. My HOPL-4 paper, Thriving in a crowded and changing world: C++ 2006–2020, has a discussion of that. Major users include Google, Facebook, the semiconductor industry, gaming, finance, automotive and aerospace, medicine, biology, high-energy physics, and astronomy. Much is, however, invisible to end users.

Developers are hard to count, but surveys say about 4.5 million C++ users, and increasing. I have even heard “5 millions.” We don’t really have good ways of counting. Many measures, such as Tiobe, count “noise”; that is, mentions on the Web, but one enthusiastic student posts much more than 200 busy developers of important applications.

Q8. In this time of Artificial Intelligence, is C++ still relevant? 

Bjarne Stroustrup:  Certainly! C++ is the basis of most current AI/ML. Most new automobile software is C++, as is much high-performance software. Whatever language you use for AI/ML, the implementation usually critically involves some C++ library, such as Tensorflow. A serious data science scientist expressed it like this: I spend 97% of my time writing Python and my computer uses 98.5% of its cycles running C++ to execute that.

Q9. What are in your opinion the most interesting programming languages currently available? 

Bjarne Stroustrup: Maybe C++. Many of the ideas that are driving modern language development comes from C++ or have been brought into the mainstream through C++: RAII for general resource management. Templates for generic programming. Templates and constexpr function for compile-time evaluation. Various concurrency mechanisms. In turn, C++ of course owes much to earlier languages and research. For future developments that will affect programming techniques, I’d keep an eye on static reflection.

Much interesting work is going on in functional languages and in “Scripting” (e.g., TypeScript).

Q10. Why did you decide to leave a full-time job in academia and join Morgan Stanley?

Bjarne Stroustrup:  There were a few related reasons.

Over a decade, I had done most of the things a career academic do: teaching undergraduates, teaching graduate students, graduating PhDs, curriculum planning, written textbooks (e.g. Programming — Principles and Practice Using C++ (Second Edition)), written conference and journal research papers (e.g., Specifying C++ Concepts), applied and received research grants, sat on university committees. It was no longer new, interesting, and challenging.

I felt that I needed to get back “to the coal face”, to industry, to make sure that my work and opinions were still relevant. My interests in scale, reliability, performance, and maintainability were hard to pursue in academia.

I felt the need to get closer to my family in New York City and in Europe.

Morgan Stanley was in New York City, had very interesting problems related to reliability and performance of distributed systems, large C++ code bases, and – a bit of a surprise to me given the reputation of the finance industry – many nice people to work with.

Q11. You are also a Visiting Professor in Computer Science at Columbia University. What is the key message you wish to give to young students?

Bjarne Stroustrup:  Our civilization depends critically on software. We must improve our systems and to do that we need to become more professional. That’s the same message I’d try to send to experienced developers, managers, and executive.

Also I talk about the design principles of C++ and show concrete examples of how they were put into practice over the decades. You cannot teach design in the abstract.

Qx Anything else you wish to add?

Bjarne Stroustrup:  Education is important, but not everyone who want to write software needs the same education. We should make sure that there is a well-supported path through the educational maze for people who will write our critical systems, the ones we rely upon for our lives and livelihoods. We need to strive for a degree of professionalism equal to what we see in the best medical doctors and engineers.
I wrote a couple of papers about that: What should we teach software developers? Why? And Software Development for Infrastructure .

——————————-

Bjoern

Bjarne Stroustrup is the designer and original implementer of C++ as well as the author of The C++ Programming Language (4thEdition) and A Tour of C++ (2nd edition), Programming: Principles and Practice using C++ (2nd  Edition), and many popular and academic publications.
Dr. Stroustrup is a Technical Fellow and Managing Director in the technology division of Morgan Stanley in New York City as well as a visiting professor at Columbia University. He is a member of the US National Academy of Engineering, and an IEEE, ACM, and CHM fellow. He is the recipient of the 2018 NAE Charles Stark Draper Prize for Engineering and the 2017 IET Faraday Medal. He did much of his most important work in Bell Labs.
His research interests include distributed systems, design, programming techniques, software development tools, and programming languages. To make C++ a stable and up-to-date base for real-world software development, he has been a leading figure with the ISO C++ standards effort for 30 years. He holds a master’s in Mathematics from Aarhus University and a PhD in Computer Science from Cambridge University, where he is an honorary fellow of Churchill College.

www.stroustrup.com

Related Posts

– 10+1 Questions on Innovation to Bjarne Stroustrup. ODBMS Industry Watch, November 13, 2007

– 2 More Questions to Bjarne Stroustrup: Locations, People and Innovation. ODBMS Industry Watch, December 10, 2007

Follow ODBMS.org on Twitter: @odbmsorg

##

Jun 8 20

Fighting Covid-19 with Graphs. Interview with Alexander Jarasch

by Roberto V. Zicari

“There are an enormous amount of applications that we can provide. Just to mention a view of them: Scanning literature and patents for genes, proteins, targets and drugs. Finding information in clinical trials, which drugs are used and what inclusion/exclusion criteria exist. Automatically finding and querying genes with their synonyms and connected gene function and in which tissues they are expressed.” –Alexander Jarasch

I have interview Dr. Alexander Jarasch, head of Data and Knowledge management department at the German Center for Diabetes Research (DZD). Alexander is a team member of  The CovidGraph Project.
CovidGraph is a non-profit collaboration of researchers, software developers, data scientists and medical professionals.
The aim of the project is to help researchers quickly and efficiently find their way through COVID-19 datasets and to provide tools that use artificial intelligence, advanced visualization techniques, and intuitive user interfaces.

RVZ

Q1. What is CovidGraph?

Alexander Jarasch: CovidGraph is a knowledge graph that connects text data from scientific literature and intellectual property with clinical trials, drugs and entities from biomedical research such as genes, proteins, their function and regulation.

Q2. What is the aim of this project?

Alexander Jarasch: Provide researchers with Covid-19 relevant information that is connected and easy to query. Usually, this work requires several manual, tedious and error prone work that can be speed up by using CovidGraph.

Q3. Who Is this project aimed at?

Alexander Jarasch: Researches, medical doctors regardless of their field of research. Since Covid-19 has many complications it is a valuable resource for getting connected data.

Q4. What data sets do you use?

Alexander Jarasch: Public datasets from literature, patents, case numbers, clinical trials, therapeutic targets, genes, transcripts, proteins, gene ontology, gene expression data, pathways. There are more to come. (See list at the end of the interview)

Q5. How do you check the quality and reliability of the COVID-19 datasets you use in your project?

Alexander Jarasch: The datasources are well established databases that are used and cited for years.

Q6. Why using Knowledge graphs?

Alexander Jarasch: Research data, especially in healthcare is highly connected, very heterogenous and often unstructured. Today, these datasources are siloed and connections between them isn’t available. Connecting the datasources enables a more comprehensive view on it. By the fact of connecting data in insights occur, that have been hidden before.

Q7. Which tools are you developing to explore papers, patents, existing treatments and medications around the family of the corona viruses?

Alexander Jarasch: One the one hand we provide user interfaces for interactive data browsing and querying. For example, users can use Linkurious , Graphiken, derive GmbH and Neo4j Bloom. On the other hand we develop a more specific UI for users from biomedical research together with yWorks.

Q8. What are the applications that The CovidGraph project provides?

Alexander Jarasch: There are an enormous amount of applications that we can provide. Just to mention a view of them: Scanning literature and patents for genes, proteins, targets and drugs. Finding information in clinical trials, which drugs are used and what inclusion/exclusion criteria exist. Automatically finding and querying genes with their synonyms and connected gene function and in which tissues they are expressed.

Q9. Who is maintaining the data stored in the Knowledge Graph? Is it centralized or distributed?

Alexander Jarasch: It’s maintained by a community of volunteers and data is stored on a publicly accessible server.

Q10. The COVID*Graph should provide the data basis for understanding the processes involved in a coronavirus infection. What did you learn so far?

Alexander Jarasch: In parallel to data integration and preliminary data analysis we found that Covid-19 is supposed to affect more than just lung cells. Researchers also support this finding and can be found in several articles. We found out that ACE2 (Angiotensin-converting enzyme 2) is the gene that is mentioned most in scientific articles. This seems obvious since this is the receptor the corona virus uses to access the cells.

Qx Anything else you wish to add?

Alexander Jarasch: We are a private-public partnership and volunteers from several companies working with graph technology. We are non-profit community and hope to support researchers and doctors to find a cure for Covid-19 / Sars-Cov-2 and related diseases.

——————————–
23858_7830
Dr. Alexander Jarasch is the head of Data and Knowledge management department at the German Center for Diabetes Research (DZD). His team supports scientists from basic research and and clinical research with IT solutions from data management to data analysis. New insights from diabetes research and its complications are stored in a knowledge graph connecting data from basic research, animal models and clinical trials.
Dr. Jarasch received his PhD in structural bioinformatics and biochemistry from Ludwig-Maximilians University (LMU) in Munich and has a master’s degree in bioinformatics from the LMU and the Technical University of Munich.
He completed his postdoctoral trainings on behalf of Evonik Industries AG and Roche Diagnostics GmbH.

Resources

The CovidGraph Project

Covidgraph.org  on GitHub

bioRxiv (pronounced “bio-archive”) is a free online archive and distribution service for unpublished preprints in the life sciences. It is operated by Cold Spring Harbor Laboratory, a not-for-profit research and educational institution. By posting preprints on bioRxiv, authors are able to make their findings immediately available to the scientific community and receive feedback on draft manuscripts before they are submitted to journals.

medRxiv (pronounced “med-archive”) is a free online archive and distribution server for complete but unpublished manuscripts (preprints) in the medical, clinical, and related health sciences.

The Lens is building an open platform for Innovation Cartography. Specifically, the Lens serves nearly all of the patent documents in the world as open, annotatable digital public goods that are integrated with scholarly and technical literature along with regulatory and business data. The Lens will allow document collections, aggregations, and analyses to be shared, annotated, and embedded to forge open mapping of the world of knowledge-directed innovation.

Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data. Ensembl tools include BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species.

Gene integrates information from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes, and links to genome-, phenotype-, and locus-specific resources worldwide.

New UniProt portal for the latest SARS-CoV-2 coronavirus protein entries and receptors, updated independent of the general UniProt release cycle.

RefSeq: NCBI Reference Sequence Database. A comprehensive, integrated, non-redundant, well-annotated set of reference sequences including genomic, transcript, and protein.

The Gene Ontology resourceThe mission of the GO Consortium is to develop a comprehensive, computational model of biological systems, ranging from the molecular to the organism level, across the multiplicity of species in the tree of life.

The GTEx Portal. The Genotype-Tissue Expression (GTEx) project is an ongoing effort to build a comprehensive public resource to study tissue-specific gene expression and regulation.

REACTOME is an open-source, open access, manually curated and peer-reviewed pathway database. Our goal is to provide intuitive bioinformatics tools for the visualization, interpretation and analysis of pathway knowledge to support basic and clinical research, genome analysis, modeling, systems biology and education.

ClinicalTrials.gov is a resource provided by the U.S. National Library of Medicine.

COVID-19 Response United Nations

COVID-19 Resources Johns Hopkins University.

COVID-19 datasets

COVID-19 Open Research Dataset (CORD-19)

In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a free resource of over 44,000 scholarly articles, including over 29,000 with full text, about COVID-19 and the coronavirus family of viruses for use by the global research community.
https://pages.semanticscholar.org/coronavirus-research

The Lens COVID-19 Datasets

The Lens has assembled free and open datasets of patent documents, scholarly research works metadata and biological sequences from patents, and deposited them in a machine-readable and explorable form.
https://about.lens.org/covid-19/

Ensembl Genome Browser

Ensembl is a genome browser for vertebrate genomes that supports research in comparative genomics, evolution, sequence variation and transcriptional regulation. Ensembl annotate genes, computes multiple alignments, predicts regulatory function and collects disease data. Ensembl tools include BLAST, BLAT, BioMart and the Variant Effect Predictor (VEP) for all supported species. http://www.ensembl.org

NCBI Gene Database

Gene integrates information from a wide range of species. A record may include nomenclature, Reference Sequences (RefSeqs), maps, pathways, variations, phenotypes, and links to genome-, phenotype-, and locus-specific resources worldwide. https://www.ncbi.nlm.nih.gov/gene

The Gene Ontology Resource

The Gene Ontology (GO) knowledgebase is the world’s largest source of information on the functions of genes. This knowledge is both human-readable and machine-readable, and is a foundation for computational analysis of large-scale molecular biology and genetics experiments in biomedical research. http://geneontology.org

2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE

This is the data repository for the 2019 Novel Coronavirus Visual Dashboard operated by the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE). Also, Supported by ESRI Living Atlas Team and the Johns Hopkins University Applied Physics Lab (JHU APL). https://github.com/CSSEGISandData/COVID-19

United Nations World Population Prospects 2019

The 2019 Revision of World Population Prospects is the twenty-sixth round of official United Nations population estimates and projections that have been prepared by the Population Division of the Department of Economic and Social Affairs of the United Nations Secretariat. https://population.un.org/wpp/

Follow ODBMS.org on Twitter: @odbmsorg

##

Apr 28 20

On Vertica 10.0 Interview with Mark Lyons

by Roberto V. Zicari

“Supporting arrays, maps and structs allows customer to simplify data pipelines, unify more of their semi-structured data with their data warehouse as well as maintain better real world representation of their data from relationships between entities to customer orders with item level detail. A good example is groups of cell phone towers that are used for one call while driving on the highway.” –Mark Lyons

I have interviewed Mark Lyons, Director of Product Management at Vertica. We talked about the new Vertica 10.0

RVZ

Q1. What is your role at Vertica?

Mark Lyons: My role at Vertica is Director of Product Management. I have a team of 5 product managers covering analytics, security, storage integrations and cloud.

Q2. You recently announced Vertica Version 10. What is special about this release?

Mark Lyons: Vertica 10.0 is a milestone release and special in many ways from Eon Mode improvements to TensorFlow integration for trained models, and the ability to query complex data types like arrays, maps and structs. This release delivers on all aspects of why customers choose Vertica including performance improvements and our constant dedication to keeping our platform in front of the competition from an architecture standpoint.

Q3. Specifically why and how did you improve Vertica in Eon Mode?

Mark Lyons: The Eon Mode improvements are a long list of features including faster elasticity with sub-clusters, stronger workload isolation between sub-clusters and more control over the depot for performance tuning. Also Eon Mode is now available on two new communal storage options Google Cloud Platform and Hadoop Distributed File System (HDFS) in addition to what we already support which is Amazon Web Services S3, Pure Storage Flash Blades and MinIO. From here we are working on adding Azure and Alibaba for public cloud options and expanding our on-prem options with other vendors that our customers have shown interest in like Dell/EMC ECS storage offering and others.

Eon mode is run worldwide by many of our largest customers in production at this point and if you are interested in learning more about the scale and flexibility I recommend looking at our case study with The Trade Desk. They now run 2 Eon Mode clusters both at 320 nodes and petabytes of data growing every day.

Q4. So, one of your key improvement is to support complex types to improve integration with Parquet. How will that benefit businesses?

Mark Lyons: We’ve had high performance query ability on Parquet data whether that is on HDFS or S3 for years including column pruning and predicatepushdown. We continue to invest in our Parquet integration since it is an important part of many organizations’ analytics & data lake strategy. Over the past couple of releases we’ve been building the ability to query complex data types while maintaining columnar execution and late materialization for high performance.

Supporting arrays, maps and structs allows customer to simplify data pipelines, unify more of their semi-structured data with their data warehouse as well as maintain better real world representation of their data from relationships between entities to customer orders with item level detail. A good example is groups of cell phone towers that are used for one call while driving on the highway. We have seen tremendous interest from our customers in this new functionality. We have been actively testing preview builds of Vertica 10 with many customers for querying maps, arrays and structs for some time now.

Q5. How will this new release benefit people using Vertica in Enterprise Mode, people who or aren’t even on the Cloud and have no plans to go to the Cloud?

Mark Lyons: Vertica Enterprise Mode benefits from all of the improvements to the query optimizer, execution engine, machine learning, complex data types and beyond since there is only one Vertica code base and the Eon mode differences are limited to communal storage and sub-clusters. Enterprise Mode is the traditional direct attached storage. Massively Parallel Processing (MPP), shared nothing architectures are still appropriate for many organizations that don’t have plans to move to public cloud and have traditional data center infrastructure. Vertica doesn’t restrict on-premises customers to only using shared storage options like Pure Storage Flash Blades or HDFS. With a Vertica license these customers have the flexibility to deploy where they want in whichever architecture fits today, and they can change in the future without any license cost.

Q6. What are the features you offer in your in-database machine learning in Vertica?

Mark Lyons: Vertica is not normally thought of as a data science platform coming out of the MPP Column Store RDBMS space but we have built functions to make the data science pipeline very easy. We offer functions from data loading a variety of formats, enrichment, preparation and quality functions for data transformation as well as algorithms to train, score and evaluate models. There’s a lot more than I can begin to cover here. To learn more I suggest reading about the Vertica machine learning capabilities here.

In Vertica 10.0 we’ve added the capabilities to import and export PMML models to support data scientist training models in other tools like Python or Spark or use cases where a model trained in Vertica should be pushed to the edge for evaluation in a complex event processing/streaming system. We also added TensorFlow integration to import deep learning models trained on GPUs outside of Vertica into the data warehouse for scoring and evaluation on new customer or device data as it arrives.

Q7. There are several companies offering data platforms for machine learning (e.g. Alteryx Analytics, H2O.ai, RapidMiner, SAS Enterprise Miner (EM), SAS Visual Analytics, Databricks Unified Analytics Platform, IBM SPSS, Microsoft Azure Machine Learning, Teradata Unified Data Architecture, InterSystems IRIS Data Platform). How does Vertica compare to the other data platforms offering machine learning features?

Mark Lyons: The Vertica differentiation compared to these other tools is all about bringing scale, concurrency, performance and operational ML simplicity to the story. All of the data science tools you mention have equivalent functions for data prep, modeling, algorithms etc. and with all of the work we have done the past 3+ years Vertica has much of the same functionality you find in those other tools except for the GUI. For that, many of our customers use Jupyter notebooks.

Vertica performance is achieved by first the MPP architecture and second by in-memory and auto spill to disk so we can handle the largest data sets without being limited by the compute power of a single node or by the memory available like many solutions are. The beauty of this is users do not have to be aware of this or do anything. It just works!  In addition to being able to train on trillions of rows and thousands of columns you get enterprise readiness with resource management between jobs, high concurrency to support many users on the same system at the same time, workload isolation to keep ML workloads from overwhelming other analytics areas, security built in with authentication & access policies, and compliance with logging/audit of all ML workflows.

Mark

Q8. What are the most successful use cases which use the Machine Learning features offered by Vertica?

Mark Lyons: We are solving all of the most common use cases from fraud detection to churn prevention to  cybersecurity threat analytics. Vertica brings a new level of scale and speed to allow for more frequent model re-training, and use of the entire dataset even if that is trillions of rows and PBs of data without data movement or down sampling.

Q9. Anything else you wish to add?

Mark Lyons: A few weeks ago we wrapped up our Vertica Big Data Conference 2020 and I recommend anyone interested in learning more to come and watch replays of the sessions.

———————-

Mark

Mark Lyons, Director of Product Management, Vertica
Mark leads the Vertica product management team. His expertise is in new product introductions, go-to-market planning, product roadmaps, requirement development, build/buy/partner analysis, new products, innovation and strategy.

Resources

Virtual Vertica Big Data Conference 2020

Related Posts

Top 10 Highlights from the Virtual Vertica BDC 2020

Follows us on Twitter: @odbmsorg

##

Apr 19 20

On Drones and Socio-Technical thinking. Interview with Gordon Hoople and Austin Choi-Fitzpatrick

by Roberto V. Zicari

Sociotechnical education is our way of talking about how to help students recognize the complex interconnection of the social and the technical. We bring students together from different majors, give them real problems to tackle, and then challenge them with reading and discussions that force them to face their own assumptions.”Gordon Hoople

“As we developed the class, and later wrote a book together, we realized how much engineering wrestles with social issues (whether it recognizes this or not) and how much social change efforts are supporting or resisting changes that engineers dreamed up in the first place.” –Austin Choi-Fitzpatrick

I have interviewed Gordon Hoople and Austin Choi-Fitzpatrick. We talked about Sociotechnical educationthe mission of The Good Drone Lab, their forthcoming book “Drones for Good. How to Bring Sociotechnical Thinking into the Classroom” and how to engage students in challenging conversations at the intersection of technology and society.

RVZ

Q1. What is a socio technical education?

Gordon: Sociotechnical education is our way of talking about how to help students recognize the complex interconnection of the social and the technical. This is as true for classroom assignments as it is in real world projects. Is Wikileaks and Russian interference in the United States’ 2016 election a story about technology, a story about politics, a story about society, or a stunning admixture of all three? Students have a real 0-60 moment when they get their first real job–we want to give them a head-start in that process!

Q2. You are are co-directors of “The Good Drone Lab”. What is it?

Austin: The Good Drone Lab, which I started with Tautvydas Juškauskas in 2014, is focused on tinkering and experimenting with the potential drones have for promoting the greater good. We’re exclusively focused on applications that level the playing field between the powerful and the powerless. How can we democratize surveillance, and how can we hold authorities to account, even in protests? More recently we’re also interested in exploring how people from the technical arts (like engineering) can work alongside folks from the social sciences (like sociology or ethnic studies).

Q3. Why did a social scientist decided to collaborate with an engineer, and an engineer with a sociologist, and together on a book about drones and sociotechnical thinking in the classroom?

Gordon: For fun! We’d be lying if we didn’t say up front that we think drones are cool and that we like working with one another. We’d also be lying if we didn’t say that there was some money involved! In the fall of 2016 our colleagues received a National Science Foundation grant for “Revolutionizing Engineering Departments.” We thought this would be a cool effort to join, so we pitched a collaborative class and crossed our fingers.

Austin: As we developed the class, and later wrote a book together, we realized how much engineering wrestles with social issues (whether it recognizes this or not) and how much social change efforts are supporting or resisting changes that engineers dreamed up in the first place. So, we had a spark, and from there we’ve built some very interesting fires. I’m not sure about that analogy, though!

Q4. Why do disciplinary silos create few opportunities for students to engage with others beyond their chosen major? Why do you think that engaging students in challenging conversations at the intersection of technology and society is a useful thing?


Austin: Universities are fossils. They were dreamed up four hundred years ago, and have been ticking along with only minor modifications ever since. That’s not entirely true, and we’re fortunate to work in institutional spaces that welcome innovation, but for the most part academics are hived off into their disciplines, and do a pretty good job self-policing so that we steer clear of one another. That’s a good way to avoid accidents. The problem is that if I steer clear of Gordon’s area of expertise, then we might not bump into one another! So we organize to prevent happy accidents. We think that’s silly. The world is made up of both hidebound institutions and happy accidents. We want our students to see that.

Gordon: So our idea is to take hackathons and maker spaces one step further, and push students together from all these different academic silos. Engineers and social change students both have to leave the university to work with people very different from them. We’re just moving some of that engagement into the classroom and our class projects.

Austin: The real world is fundamentally sociotechnical. All the time international aid groups, for example, are launching new initiatives around clean water; we’re saying this is good, but engineers, nonprofits, and local communities should all be working together. The alternative is one actor setting off on their own, and this often has unintended consequences. I mean, you remember the One Laptop Per Child campaign? Later it turned out that the thing it taught every student to do was to download pornography. If we want stuff to stick, we have to think sociotechnically.

Q5. Can you please explain your socio technical approach to  interdisciplinary education?

Gordon: We bring students together from different majors, give them real problems to tackle, and then challenge them with reading and discussions that force them to face their own assumptions. We pop into and out of small group discussions, ask all the engineering students to be quiet while they listen to peace studies students, then flip the roles. For a lot of our students it’s the first time they’ve done anything like this. It’s challenging, but they seem to like it.

Q6. Do you have any evidence-based pedagogies that your approach is working and is valuable?

Austin: Yes. First of all, students tell us it’s working. But we have also incorporated cutting-edge methods for measuring learning, and then published a bunch of that work in the usual academic outlets, like conferences and journals.

Measurement is central for us, because, even from the beginning, we were both very interested in figuring out whether our methods were translating to student learning in a way we could document. In an early iteration of the class we had the benefit of working closely with a post-doc, Dr. Beth Reddy, now a professor at Colorado School of Mines, who helped us by leading interviews, focus groups, and classroom observations to see what impacts we were having on the students. While we won’t rehash the full findings from those papers here, suffice to say we do think these methods are having a measurable impact.

Q7. What are the main obstacles for effective interdisciplinary teaching?

Gordon: Time! It takes time to do this right, to get on the same page, to communicate clearly to students. Students want to understand the material, and also want to know how to do well in a class. Fortunately, we both agree on those things, but it still takes time to plan the class, then to communicate everything to students in a way that adds more signal than noise.

Q8. In your book you write about The Ethics of Drones. Can you please elaborate on this?

Austin: We are very concerned that drone use will be reserved for the already-powerful. I’m a social movement scholar, and am focused on maintaining balances of power between the state and the people, and between the haves and the have-nots. What happens if only governments and big business have drones? We want to democratize access to important tools for holding the powerful to account. I wrote a whole different book about that (The Good Drone, MIT Press, link), and we wanted our students to wrestle with some of those broader questions, whether or not they agree with me.


 Author Bio Photo

Gordon Hoople is an assistant professor and a founding faculty member of Integrated Engineering Department at the University of San Diego’s Shiley-Marcos School of Engineering. His work focuses on engineering education and design. He is the principal investigator on the National Science Foundation Grant “Reimagining Energy: Exploring Inclusive Practices for Teaching Energy Concepts to Undergraduate Engineering Majors.” His design work occurs at the intersection of STEM and Art (STEAM). He recently completed the sculpture Unfolding Humanity, a 12 foot tall, two ton dodecahedron that explores the relationship between technology and humanity. Featured at Burning Man and Maker Faire, this sculpture brought together a team of over 80 faculty, students, and community members.

Austin Choi-Fitzpatrick is an associate professor of political sociology at the Kroc School of Peace Studies at the University of San Diego, and is concurrent associate professor of social movements and human rights at the University of Nottingham’s Rights Lab and School of Sociology and Social Policy. His work focuses on politics, culture, technology, and social change. His recent books include The Good Drone (MIT Press, 2020) and What Slaveholders Think (Columbia, 2017) and shorter work has appeared in Slate, Al Jazeera, the Guardian, Aeon, and HuffPo as well as articles in the requisite pile of academic journals.

Resources

Drones for Good. How to Bring Sociotechnical Thinking into the Classroom.  Gordon Hoople, University of San Diego, Austin Choi-Fitzpatrick, University of San Diego, University of Nottingham, ISBN: 9781681737744 | PDF ISBN: 9781681737751 Hardcover ISBN: 9781681737768 Copyright © 2020 | 111 Pages, Morgan & Claypool.

– The Good Drone: How Social Movements Democratize Surveillance (Acting with Technology). Austin Choi-Fitzpatrick, The MIT Press (July 28, 2020)

Related Posts

– Embedded EthiCS @ Harvard: bringing ethical reasoning into the computer science curriculum. ODBMS.org DECEMBER 17, 2019

– On CorrelAid: Data Science for Social Good. Q&A with André Lange.ODBMS.org AUGUST 28, 2019

Follow us on Twitter: @odbmsorg

##

 

Mar 19 20

On Continuous Integration and Software Flight Recording Technology. Interview with Barry Morris

by Roberto V. Zicari

“The key challenge, however, is the cultural change required within software engineering teams to evolve to a state where any software failure, no matter how insignificant it may seem, is unacceptable. No single software engineer, or team, possesses all of the technical experience required to keep a CI pipeline functioning at this level. There must be a cross-disciplined commitment to work towards this goal throughout the development lifecycle in order to be effective.” –Barry Morris

I have interviewed Barry Morris, well-know serial entrepreneur and currently CEO at Undo. We talked about the challenges to  deliver high quality software at a productive level, the cost of persistent failures in Continuous Integration (CI) pipelines, and how Software Flight Recording Technology could help.

RVZ

Q1. What are typical challenges software engineering teams face to deliver high quality software at a productive level?

Barry Morris: Reproducibility is the fundamental problem plaguing software engineering teams. The inability to rapidly, and reliably, reproduce test failures is slowing teams down. It blocks their development pipeline and prevents them from delivering software on time, and with confidence.

Organizations that can solve the issue of reproducibility are able to confidently deliver quality software on a scheduled, repeatable, and automated basis by eliminating the guesswork associated in defect diagnosis. The best part is that it does not require a complete overhaul of existing tool sets – rather an augmentation to current practices.

The key challenge, however, is the cultural change required within software engineering teams to evolve to a state where any software failure, no matter how insignificant it may seem, is unacceptable. No single software engineer, or team, possesses all of the technical experience required to keep a CI pipeline functioning at this level. There must be a cross-disciplined commitment to work towards this goal throughout the development lifecycle in order to be effective.

Q2. Software failures are inevitable. Do you believe the adoption of Continuous Integration (CI) as a key contributor to agile development workflows, is the solution?

Barry Morris: Despite the best efforts of software engineering teams, there are too many situational factors outside of their direct control that can cause the software to fail. As teams add new features, new processes, new microservices, and new threading to their code, the risk of unpredictable failures grows exponentially.

The adoption of CI as a key contributor to agile development workflows is on the rise. I believe it is the key to delivering software at velocity and offers radical gains in both productivity and quality. According to a recent survey conducted by Cambridge University, 88% of enterprise software companies have adopted CI practices.

Q3. It seems that the volume of tests being run as a result of CI leads to a growing backlog of failing tests. Is it possible to have a zero- tolerance approach to failing tests?

Barry Morris: Unfortunately, the volume of tests being run as a result of CI leads to a growing backlog of failing tests – ticking time bombs just waiting to go off – costing shareholders $1.2 trillion in enterprise value every year.

True CI requires a zero-tolerance approach to software failures. Tests must pass reliably and any failures represent new regressions. Failures that only show up once every 300 runs, or under extreme conditions only make this more challenging. The same survey also found that 83% of software engineers cannot keep their test suite clear of failing tests

Q4. You are offering a so called Software Flight Recording Technology (SFRT). What is it and what is it useful for?

Barry Morris: SFRT enables software engineering teams to record and capture all the details of a program’s execution, as it runs. The recorded output allows the team to then wind back the tape to any instruction that executed and see the full program state at that point. Whereas static analysis provides a prediction of what a program might do, SFRT provides complete visibility into what a program actually did, line by line.

SFRT can speed up time-to-resolution by a factor of 10 by eliminating guesswork, using real, actionable data-driven insights to get to the crux of the issue, faster. But the beauty of this kind of approach is that it is not simply a last line of defense against the most challenging defects (e.g intermittent bugs, concurrency defects, etc). Rather, it can be used to improve the time-to-resolution of all software failures.

Q5. Is SFRT the equivalent to a black box on an aircraft?

Barry Morris: Yes, absolutely.

Q6. When a plane crashes, one of the first things responders do is locate the black box on board. How does it relate to software failures?

Barry Morris: When a plane crashes, one of the first things responders do is locate the black box on board. This device tells them everything the plane did – its trajectory, position, velocity, etc. – right up until the moment it crashed. SFRT can do the same for software, allowing software engineering teams to view a recording of what a program was doing before, during, and after a defect occurs.

Q7. Who has already successfully used Software Flight Recording Technology to to capture test failures?

Barry Morris: SAP HANA, a heavily multi-threaded, feature-rich, in-memory database, is built from millions of lines of highly-optimized Linux C++ code. To ensure the software is high-quality and reliable, the engineering team invested considerably in CI and employed rigorous testing methodologies, including fuzz-testing.

However, non-deterministic test failures could not reliably be reproduced for debugging. Analyzing logs from failed runs could not capture enough information to identify the root cause of specific failures; and reproducing complex failures on live systems was time-consuming. This was slowing development down.

LiveRecorder, Undo’s platform based on Software Flight Recording Technology, was implemented to capture test failures. Recording files of those failing runs were then replayed and analyzed. With LiveRecorder, engineers could see exactly what their program did before it failed and why – allowing them to quickly hone-in on the root cause of software defects.

As a result, SAP HANA was able to accelerate software defect resolution in development, by eliminating the guesswork in software failure diagnosis. On top of significantly reducing time-to-resolution of defects, SAP HANA engineers managed to capture and fix 7 high-priority defects[1] – including a couple of race conditions, and a number of sporadic memory leaks and memory corruption defects.

Q8. What are the key questions to consider when developing CI success metrics?

Barry Morris: Every organization judges success differently. To some, finding a single, hard-to-reproduce bug per month is enough to deem changes to their CI pipeline as effective. Others consider the reduction in the amount of aggregate developer hours spent finding and fixing software defects per quarter as their key performance indicator. Speed to delivery, decrease in backlog, and product reliability are also common metrics tracked.

Whatever the success criteria, it should reflect the overarching goals of the larger software engineering team, or even corporate objectives. To ensure that teams measure and monitor the success criteria that matters most to them, software engineering managers and team leads should establish their own KPIs.

Some questions to consider when developing CI success metrics:

  • Is code shipped earlier than previous deployments?
  • How many defects are currently in the backlog compared to last week/month?
  • Are developers spending less time debugging?
  • Are other teams waiting for updates?
  • How many developer hours does it take to find and fix a single bug?
  • How long does it take to reproduce a failure?
  • How long does it take to fix a failure once found?
  • What is the average cost to the organization of each failure?

These questions are designed as an initial starting point. As mentioned earlier, each organization is different and places value on certain aspects of CI depending on team dynamics and needs. What’s important is to establish a baseline to ensure agreement and commitment across teams, and to benchmark progress.

——————————-

Barry Morris

Barry Morris, CEO, Undo.

With over 25 years’ experience working in enterprise software and database systems, Barry is a prodigious company builder, scaling start-ups and publicly held companies alike. He was CEO of distributed service-oriented architecture (SOA) specialists IONA Technologies between 2000 and 2003 and built the company up to $180m in revenues and a $2bn valuation.

A serial entrepreneur, Barry founded NuoDB in 2008 and most recently served as its Executive Chairman. Barry has now been appointed as CEO in September 2018 to lead Undo‘s high-growth phase.

Resources

– Research Report: The Business Value of Optimizing CI pipeline. Judge Business School from the University of Cambridge in partnership with Undo (link to download the report- registration required)

–  3 Key Findings from our CI Research Report, Undo Blog post:

The research concluded three key findings:

  1. Adoption of CI best practices is on the rise. 88% of enterprise software companies say they have adopted CI practices, compared to 70% in 2015
  2. Reproducing software failures is impeding delivery speed. 41% of respondents say getting the bug to reproduce is the biggest barrier to finding and fixing bugs faster; and 56% say they could release software 1-2 days faster if reproducing failures wasn’t an issue
  3. Failing tests cost the enterprise software market $61 billion. This equals 620 million developer hours a year wasted on debugging software failures

[1] Improving Software Quality in SAP HANA, 2018

– Technical Paper: Software Flight Recording Technology, Undo (link: registration required to download the paper.)

Related Posts

– On Software Reliability. Interview with Barry Morris and Dale Vile. ODBMS Industry Watch, April 2, 2019

– Go Green Stay Green. Q&A with Greg Law. ODBMS.org JULY 1, 2019

– On Software Quality. Q&A with Alexander Boehm and Greg Law. ODBMS.org, November 26, 2018

– Integrating Non-Volatile Memory into an Existing, Enterprise-Class In-Memory DBMS. By Alexander Böhm. ODBMS.org, JULY 18, 2017

Follow us on Twitter: @odbmsorg

##

 

Feb 13 20

On AI for Insurance and Risk Management. Interview with Sastry Durvasula

by Roberto V. Zicari

“AI in complex global industries is in a league of its own, with many opportunities, many risks and many rewards! We definitely see AI having a major impact on the entire risk and insurance industry value chain from improving customer experience to changing core insurance processes to creating next-gen risk products.” –Sastry Durvasula

I have interviewed Sastry Durvasula, Chief Digital Officer and Chief Data & Analytics Officer at Marsh, Inc.

RVZ

Q1: You are Marsh’s Chief Digital Officer and Chief Data & Analytics Officer. What are your main priorities?

Sastry Durvasula: My primary focus is leading Marsh’s global digital, data and analytics strategy and transformation, while building new digital-native businesses and growth opportunities. This includes development of next-gen digital platforms and products; data science and modelling; client-facing technology; and digital experiences for clients, carrier partners and colleagues. We also launched Marsh Digital Labs to incubate emerging tech, InsurTech partnerships, and forge industry alliances. Another key aspect of the role is to drive digital culture transformation across the company.

Q2: Can you talk briefly about Marsh Digital Labs?

Sastry Durvasula: We established Marsh Digital Labs as an incubator for developing innovative insurance products, running select tech experiments and supporting strategic engagements with clients, insurance carriers and InsurTechs. The Labs has an innovation funnel process whereby we select and move ideas from concepts to actual market pilots before handing off to the product teams for full-scale development. This allows us to be agile, fail fast and demonstrate product viability, which is critical in today’s fast-changing tech landscape. Our most recent pilot was RiskExchange, a blockchain for trade credit insurance, which was actually the winning idea from our global colleague hackathon called #marshathon.

We’re currently focused on three emerging tech areas  – AI/ML, Blockchain and IoT – and exploring a number of new insurance products and distribution channels in the small commercial and consumer sector, as well as in the sharing economy, cyber, autonomous vehicles, and worker safety areas. But ongoing R&D is a core component of the Labs, too, and we collaborate with a number of industry, academia and open-source initiatives. And we need to cut through all the hype and focus on use cases that create true business impact. For example, the Labs has a dedicated unit right now working on using AI and IoT to develop next-gen risk model capabilities that leverage new streams of real-time data, cloud-based platforms, and machine learning algorithms.

Q3: Can you talk a little bit about your overall data infrastructure and the new data streams you are exploring?

Sastry Durvasula: Yes, absolutely. We implemented the Marsh big data ecosystem leveraging multi-cloud platform and capabilities, advanced analytics and visualization tools, and API-based integrations. It has been built to support data in any format, source or velocity with dynamic scalability on processing and storage. Data privacy and governance are safeguarded with metadata and controls built-in.

Keep in mind that traditional risk management and insurance placement is mostly done using static exposure data that gets updated typically only during policy renewal. We are actively working on changing the game by bringing in a wide variety of newer data streams, including IoT data and other external sources, in order to quantify and manage risks better.

For example, in the marine and shipping industry this includes behavioral data such as vessel statistics, movements, machinery and weather information, combined with historical claims data. We can get a more accurate picture of risk and can price more accurately. To assist with these metrics, we recently launched a partnership with InsurTech firm Concirrus that specializes in marine analytics. Similarly, in property risk we are looking at factors such as building integrity as measured by vibrations or earthquake potential,  damage from water leakage as measured by sensors or actuators, and so on. In telematics, we can use real-time GPS and speed data, as well as driving behavioral data like braking, acceleration and so on.

We are also researching the overall risk profile of smaller enterprise clients by leveraging third-party external sources such as news, social, government and other regulatory or compliance filings. So, there is a wide variety of data and data types that we deal with or are actively exploring.

Q4: What is exciting about AI in the insurance and risk management space?

Sastry Durvasula: AI in complex global industries is in a league of its own, with many opportunities, many risks and many rewards. We definitely see AI having a major impact on the entire risk and insurance industry value chain from improving customer experience to changing core insurance processes to creating next-gen risk products.

Underwriting based on AI models working on dynamic data streams will result in usage-based and on-demand insurance offerings. We will also see systems that allow straight-through quoting, placement and binding of selected risks powered by AI. For insurance brokers and carriers, this will allow more intelligent risk selection methods.

Claims is another area where AI will have a major impact including automated claims management, claims fraud detection, and intelligent automation of the overall process.

Accelerating use of AI in many industries will have an impact on risk liability models. For example, as AI-powered autonomous vehicles become mainstream, liability shifts from personal auto coverage to a commercial product liability held by the manufacturer. So the insurance industry a decade from now may look quite different from today.

We have also been working on conversational AI and chatbots to support various client-facing and colleague-facing initiatives. AI will play a big role in intelligent automation – insurance is an industry with vast numbers of documents and is very manual and process-oriented. By providing AI-powered human-augmentation functions that improve and enhance the manual processes, we will see efficiencies in the overall industry.

Q5: What are some of the emerging risk and insurance products that Marsh is working on?

Sastry Durvasula: We have several new products targeting either different risks or different market segments. We recently launched Blue[i] next-gen analytics and AI suite, powered by Marsh’s big data ecosystem. Many of Marsh’s big enterprise customers retain significant risk in their portfolio. In fact, in many cases, the premium paid for risk transfer to the insurance markets is only a certain percentage of the Total Cost of Risk (TCOR) to that company. Worker’s compensation is one of the biggest risks in the US, costing employers nearly $100B annually. Our Blue[i] ML models powered by behavioral data and real-time insights help with the prediction of and reduction in claims, as well as reduction in insurance premiums.

Cyber Risk is definitely one of the fastest growing risk categories in the world. We launched market-leading solutions to understand, quantify and manage an enterprise’s cyber risk. These include several proprietary ways of quantifying cyber exposure, cyber business interruption and data breach impacts. These techniques will get more sophisticated as we build out our AI capabilities and increase our data sources.

Pandemic risk is another emerging risk category that we are building out solutions for. In partnership with a Silicon Valley based startup called Metabiota and re-insurer MunichRe, we have created an integrated pandemic risk quantification and insurance solution targeted at key industries in the travel, aviation, hospitality and educational sectors.

In addition to emerging risk products, we have also been innovating on digital solutions in the small commercial and consumer space. We launched Bluestream, a cloud-based digital broker platform for affinity clients, providing them with a new, streamlined way to offer insurance products and services to their customers, contractors, and employees.

Q6: Can you elaborate on how AI and IoT enable real-time risk management?

Sastry Durvasula: AI and new IoT data streams are making real-time risk management a possibility because enterprises have an up-to-the minute view of changing risk exposures and can effectively take actions to mitigate them. It changes how risks are calculated – from traditional actuarial models based on historic events to AI-powered analytics that support dynamic views of risk-triggering mitigating actions.

For example, in the marine use case, cargo insurance policies can be repriced in real-time based on the operator behavior, value of cargo, sea and weather conditions, and many other dynamic variables. In addition to repriced risk, the operator can also be ‘nudged’ to take less riskier actions in exchange for reduced insurance pricing.

We are also actively leveraging wearables to drive reduction in workers compensation claims based on repetitive motion as well as to improve worker safety. By using data from wearables such as smart belts that measure an employee’s sitting, standing, bending, twisting, walking and other repetitive motion actions, dashboards are created to collect and show individual and aggregate movement and locations. Our models recommend ways to improve the client’s safety as well as ergonomic plans to reduce injury and claims likelihood.

Q7: There is a lot of concern around possible malicious use of AI as the technology progresses. Can you talk about some of the risks posed by AI?

Sastry Durvasula: Definitely, this is an important area for us going into the future. AI models are not perfect at all – in fact, far from it. AI models trained on data sets containing unintentional human biases will reflect that same prejudice in their predictions. We are also starting to see more and more cases where opaque AI models resulted in inscrutable errors that were only uncovered after lengthy lawsuits. As more and more complex AI algos and models make their way to the enterprise, it has become very urgent to incorporate accountability and trust criteria into different stages of the model creation. This includes being on the lookout for bias in training data to ‘explainable and interpretable’ models and to have a meaningful appeals process. Definitely, thoughtful regulation needs to be introduced in a way not to impede the technological progress, but to push it in the right direction.

In addition to the above, AI is already causing major headaches by amplifying the ability of bad actors – whether it is automating hacking attempts that make corporate security even harder, or causing broader global harm with fake news and propaganda, or making existing weaponry more destructive. As mentioned earlier, cyber risk is the fastest growing risk category and AI will only add more fuel to the fire.

Not everything around AI is increasing risk, though. Apparently, 90% of auto accidents are caused by human errors. So in this case the rise of AI-powered autonomous vehicles may actually bring down overall driving risk as they become more mainstream!

Q8: How do you see AI governance evolving at the enterprise level?

Sastry Durvasula: AI governance is definitely an area that will get a lot of attention over the next 18-24 months and beyond as more and more AI models are implemented by firms across various industries. Operationalizing AI systems is a complex multi-step process that is also complicated by the fact that AI models can drift in performance, especially if they have feedback loops and are training continuously. In addition, AI models can vary in the degree of autonomy – for example, a low autonomy model that supports human augmentation may require less governance as opposed to completely autonomous systems that necessitate a very high degree of governance.

At the very least we see the following issues being very key for AI governance in enterprise systems: explainability, interpretability, and accountability.

The first refers to explainability standards – understanding why an AI system is behaving in a specific way or even if the AI can be explained. This will be critical to improving the overall trust on the accuracy and appropriateness of the predictions. Also, the interpretability of AI algos and models will be a key feature. Finally, accountability tools, such as the ability to audit a model or ways to contest a prediction, will be needed.

Other important issues are the ability to stop biases from creeping into models as well as incorporating appropriate safety controls into the overall system. Safety can be improved with continuous monitoring to check whether the AI system violated any safety constraints, and automatic failover or human override in the case of any suspected safety breach. The quandary about how to limit biases converges with the dilemma around AI ethics – should ethical AI be approached through self-regulation in the development of AI tech, or by creating ‘moral machines’ where ethics and values are built into the machine. In either case, ethics is generally open to interpretation and is not yet in the legal framework.

In addition, as a risk management company, we are always on the lookout for liability issues for our enterprise clients. As the client implements AI, it has to be noted that some person or organization is still ultimately responsible for the actions of the AI systems under their control – no matter how complex or sophisticated the AI model is. On top of it, most enterprise systems will typically rely on AI models that have been developed by a tech company. In many such scenarios, it is not very clear where the liability lies in the case of an incident. For example, if an autonomous vehicle has an accident based on some AI model failure, it is not clear whether the vehicle manufacturer is liable or whether it is the AI software provider or maybe even the AI chip vendor! We are at very early stages of such complex liability frameworks and we may need to have governments stepping in with clear regulatory guidelines in such cases. These are very early days but again we expect to see a flurry of activity in this sector soon.

Q9: How are you attracting top talent in AI, analytics and other emerging tech areas?

Sastry Durvasula: Talent is a big focus area for us. We have been able to attract a number of engineering and product experts, and data science talent, with diverse industry backgrounds. In the US, we hired the head of Labs in Silicon Valley, the head of data science in New York, and built our digital hub in Phoenix. We recently launched global innovation centers in select locations to attract regional talent, and have been forging industry and academia alliances.

It is equally important to keep the team energized and provide cross-functional development opportunities. There are some very interesting and complex data, analytics and digital problems in the risk and insurance space as I discussed earlier. We focus on shedding light on them, building an agile culture, and fostering experimentation.

As an example, we launched a global colleague hackathon called #marshathon that had amazing response and participation. The winning teams get to partner with our Labs to incubate the idea and launch in-market pilots. We also launched the first-ever all-women hackathon in the industry called #ReWRITE, for Women, Risk, Insurance, Tech and Empowerment, in the US and Europe working with Girls in Tech and other industry partners. It was a great opportunity for women technologists from universities, startups and other corporations to network, learn and hack some innovative ideas utilizing AI, IoT, blockchain and other digital technologies.

Q10: Have you seen any significant or notable changes in the risk and insurance industry from when you started?

Sastry Durvasula: Where there is risk, there is opportunity. We are seeing increased momentum and significant investments in digital, data and analytics, and InsurTech is gaining speed. Digital has become a Board level topic in the industry. New collaborations and consortia are forming, especially leveraging the power of Blockchain and other emerging technologies.

There are as many opportunities as there are challenges both on the demand side and the supply side of the value chain. The rapidly changing cyber risk landscape, increased surface area with IoT devices, autonomous vehicles, sharing and gig economy, and other Industry 4.0 advancements are bringing new opportunities while adding new complexities in a tightly regulated environment.

Legacy operational systems are the delimiters for the industry to fully capitalize on these opportunities and address the challenges, and companies need to make digital transformation a strategic and relentless priority.  As Yoda would say, “Do. Or do not. There is no try.”

————————–
Sastry-Durvasula_pic
Sastry Durvasula
Chief Digital and Chief Data & Analytics Officer, Marsh

Sastry is CDO and CDAO of Marsh, the world’s leading insurance broker and risk adviser. He leads the company’s digital, data and analytics strategy and transformation, while building new digital-native businesses and growth opportunities. This includes development of innovative digital platforms and products, data science & modelling, client-facing technology, and digital experiences across global business units. In his previous role at American Express, Sastry led global data and digital transformation across the lifecycle of cardmembers and merchants, driving innovation in digital payments and commerce, big data, machine learning, and customer experience.

Sastry plays a leading role in industry consortia, CDO/CIO forums, FinTech/InsurTech partnerships, and building academia/research affiliations. He is a strong advocate for diversity & inclusion and is on the Board of Directors for Girls in Tech, the global non-profit that works to put an end to gender inequality. Sastry launched an industry-wide initiative called #ReWRITE focused on Women, Risk, Insurance, Technology & Empowerment. He holds a Master’s degree in Engineering, is credited with 20+ patents and has been the recipient of several industry awards for innovation and leadership.

Resources

The Ethics of Artificial Intelligence, Frankfurt Big Data Lab.

Related Posts

On The Global AI Index. Interview with Alexandra MousavizadehODBMS Industry Watch, 2020-01-18

On Innovation and Digital Technology. Interview with Rahmyn KressODBMS Industry Watch ,2019-09-30

On Digital Transformation, Big Data, Advanced Analytics, AI for the Financial Sector. Interview with Kerem TomakODBMS Industry Watch, 2019-07-08

Follow us on Twitter: @odbmsorg

##