The monthly blog party that is known as T-SQL Tuesday has hit 100th episode (and I only missed 98 of them!). In true Olympic fashion, it returns to its roots for its centennial celebration. In other words, our host this month is Adam Machanic (b|t) himself. And even though 100 is a perfectly valid number to look back, he decided that looking ahead is more fun. He asks all bloggers to whip out the crystal ball and predict what our world will look like in another 100 months, when (hopefully) T-SQL Tuesday #200 will be celebrated.
I must admit that, looking at the news, looking at current trends in the world, it is easy to get very pessimistic about the future. But as a father of two fantastic children, I decided long ago that I can no longer afford to wallow in pessimistic visions of the future. So let’s all assume that we as the collective human race will not destroy our world in any of the dozens of ways we could do that. Let’s assume we will all be here in another 100 months, and be able to look back at these posts and laugh at how wrong we were.
So I will not look at politics, environment, social movements, or any other global trends. I will focus on trends in the nice shielded world of data professionals. And within that subset, I have decided to pick four key points, three because they are or have been trending in the last few years and one because that’s where my heart is.
Cloud computing definitely has taken off. I will admit that I have been on the wrong side of this for way too long. I fully expected that most companies would prefer not to have their valuable data stored on someone else’s computer. But the benefits of cloud computing are too real, so cloud computing has taken off big time. Having your application and/or your data in the cloud means that your employees can connect to it from wherever they are, using whatever device they have, without you having to provide the infrastructure. This is great for enabling sales people, drivers, or support staff that gets sent to a client site. It also makes it a lot easier to offer remote working to your employees.
Because of those benefits, I now expect this trend to continue even more than it already has. In all the more technologically advanced areas of the world, I expect that in a few years we will all expect wireless internet signal to be available everywhere. This will be used by personal devices (we’d call them phones now but their name may change) that everyone has as a replacement for many different things we now need to keep: phone, car keys, house keys, agenda, wallet, tickets, remote control, etc. The device itself will have limited storage and computing; it will rely on that fast internet connection that is assumed to always be available. Internet outages will become REALLY interesting!
More and more companies will move towards serverless. The computers used in the offices will connect to cloud computers where all application logic is done and all data is stored.
There will be some exceptions to this. For companies that have really large volumes of data and really complex programs running, the cost of buying all that “as a service” will be prohibitive and it will be cheaper to have on premises hardware and the accompanying support staff. There may also be companies where the sensitivity of the data continues to stand in the way of moving to cloud. And some companies will keep critical applications on their own hardware because they do not want to run the risk of downtime when an internet provided has issues.
Overall, however, I predict that in 100 months most applications that are now hosted and maintained on premises will have moved to the cloud or someone is working on moving it there.
Artificial intelligence and machine learning
There has been a lot of progress in the AI area. One of the areas where I was recently really impressed is in voice recognition. I was in the car, navigation app open on my phone, which was on the dashboard. I clicked the speak button and muttered the name of a company and a city. In between the background noise of the engine already running and the radio playing, it recognized what I had said, found the address of the company’s offices, and provided me with driving directions (avoiding a traffic jam along the way).
However, I also noticed its limitations. When I got a call that the meeting had been cancelled, I said “Hey Google! Bestemming wissen” (which is Dutch for “erase destination”). It computed a new route for a three hour drive. I later found that this route was to the German city Wissen (written like the Dutch word for erase, but pronounced differently). Clearly, the system has not been programmed to recognize normal Dutch instructions to erase a destination and responded with the best match it could find.
These problems will disappear over time. Siri and Cortana already beyond speech to text transformation and are starting to get better at interpreting the meaning of spoken instructions. This trend too will continue. Remember those personal unified devices I mention in the previous paragraph? They will not have an on-screen keyboard anymore; they will be purely voice driven. And they will become smarter and smarter. Application developers may need to start looking for other employment as software becomes smart enough to understand a spoken request and find on its own the best way to fulfill it. There will still be demand for people to code the more complex applications but standard reports and dashboards will become entirely machine-generated.
But AI and ML is much more than just voice recognition and voice-based interaction. There is also a huge trend nowadays towards using machine learning algorithms to analyze huge collections of data to train a predictive model that is then applied to new cases to predict the most likely outcome. And in this area, I expect problems. Not because of the algorithms itself, but because of the people. When a computer pops up a conclusion, people tend to trust the algorithms and blindly follow the recommendations. So sales people will no longer call all leads, but only the 30% marked as most likely to respond. And when the prediction is wrong and a very viable prospect is never called, no one will ever know so this problem will not heal itself. Until things really start to go wrong. Until an AI algorithm at an airport points out the people in line that need to be inspected and fails to point out someone who then turns out to be a bad person with bad plans. Or until something else terrible happens and everybody points at anonymous algorithms in unknown computers that base their conclusion on recognizing patterns in thousands of known inputs.
The picture to the right (click to enlarge) is a screenshot that shows the result of feeding the T-SQL Tuesday logo to a publicly available visual recognition website. Yes, it recognizes Tuesday with 99% certainty. But all other weekdays are also mentioned with high 90s certainty. This is still a beta and scientists are working day over day to improve it. One day we will have software that uses the results as visual recognition as input to a prediction model trained by Machine Learning. And we will all blindly follow the recommendations – until the stinky stuff starts hitting the fan because either the recognition was incorrect, or the model has a bad prediction, or both.
My prediction is that, 100 months from now, many data professionals will be licking their wounds, managers will be doing damage control, and the world will start to exhibit a healthy distrust from machine-generated predictions for which we do not see exactly what reasoning has caused the prediction. Machine learning and predictive models will remain in use but only in areas where they are “safe”. In other areas their results will at best be treated as gentle recommendations.
Data protection and privacy
A subject that has always been at interest to data professionals but is now also getting a lot of attention at management level, thanks to the GDPR legislation, is protecting data from unauthorized access and protecting the privacy of people you collect data about. At this time a lot of companies are struggling to get GDPR compliant. Some will finish in time, some will finish a bit late, and a few are not even trying. Once the EU sets a few examples by dishing out harsh penalties for non-compliance, the remaining companies will scurry to either get compliant or rigorously shut out all EU customers so they don’t have to.
But other countries will follow the EU example of setting stricter guidelines (not all; some countries continue to believe in self-regulation). Customers will start to demand more privacy guarantees even in areas where less strict laws apply. And press coverage of large scale data breaches and privacy violations will continue to fuel that fire. In a 100 months from now, every serious company will have better data protection plans then they have now, and be more privacy-friendly in their terms and conditions.
But that does not mean that there will be no more scandals. Those, too, will continue. Just like neither laws nor change in cultural perception will ever fully root out drunk driving, there will always be incidents where data is leaked or privacy is violated. Sometimes because people tend to cut corners (usually for all the right reasons, but it can still backfire). Sometimes because companies try to save money (usually for the wrong reasons, because short-term gain is easier to explain to shareholders or directors than long-term safety).
I also expect that by that time, the GDPR restrictions that are now considered to be very strict will have been surpassed. At that time other countries will be working on legislation (or maybe already have it completed) that is stricter then GDPR. One area where I mainly expect additional rules is in the use of data collected about in individual as input to train predictive models. Because of the incidents described in the previous paragraphs, large percentages of the public will have become aware of machine learning. Many of them will not accept that “their” data is used without their consent to train these models, and politicians will start to push for regulation – either because they agree with those people, or because they want their votes for the next election.
Performance tuning and execution plans
So here is my precited future. Computing is done in the cloud, through personal devices that everyone is carrying and that are always connected. They use voice recognition, and most requests are interpreted by smart algorithms the way that currently the SQL Server optimizer figures out how to perform a task based on a query. Data is stored in that same cloud but subject to strict rules as to who can use what data for what purpose. Some companies will still have on-premises computing but it’s a minority. What does this mean for the future of someone like me: someone who specializes in query tuning, someone who loves to pour over an execution plan, find the reason of a slow response, and fixes it. Will I still have work?
I think I will, but my work will be different from what it is now. And I think a lot of people currently doing other development work will be moving to “my” new line of work.
The parallel I drew between computers responding to voice commands and SQL Server compiling an execution plan for a query is very deliberate. There will be large similarities between the two processes. When you say “Hey Cortana! My parents will be over tonight so come up with a good meal that they will like and have the shop deliver the ingredients to my home by 4PM”, the computer will have to parse the request to make sure it understands what you say, then come up with a plan to do this efficiently (identify who your parents are, fetch data accessible to you on food preferences and allergies of these people, select recipes that qualify, identify store near you that can deliver on time, place order). That plan may sometimes be flawed, either causing incorrect results or just processing slower than it should, and in those cases a root case needs to be found and a fix be deployed. That will be my job, and I will still be looking at execution plans – using operators that are slightly different from what we now see in SQL Server execution plans, but probably not even that different. Identifying your parents is probably still an Index Seek in a public data collection!
So the next question is: who will we be working for. Well, for the public at large, the service of connecting to the internet and asking it questions will be based on a monthly fee. It may be a flat fee or it may be usage based, but it will not be based on the compute power involved with your requests. Just the number of requests. So it will be the provider that benefits from serving the highest number of requests using the least amount of resources. So for requests as described above, it will be the service providers who employ optimization specialists.
Companies will have different contracts. They will pay for the amount of data retrieved from various sources, for the amount of data stored, and for the compute resources used to satisfy the requests that the CEO blurts into his reporting tool. These companies will need to employ their own optimization specialists, to help the CEO phrase his request such that it will not break the bank. And finally, the few companies that still have their own computers will obviously also buy licenses to use these same algorithms and will therefor also have a need for optimization specialists.
The road ahead of us will be interesting. I really do not actually know how much of the changes I predict will actually come true. But one thing is sure: there will be change. And the pace of change will continue to increase.
If you do not want to fall behind, better make sure you stay on top of the change, and on top of emerging new technologies.