SQL Server 2014 est sorti le 1er avril 2014 avec une série de nouvelles fonctionnalités, pour une nouvelle version de la plateforme de données résolument orientée vers la haute performance, le cloud et l’analyse perspicace de tous types de données.
Cette nouvelle version répond aux enjeux métiers des entreprises en fournissant :
De la performance pour les projets critiques grâce au moteur « in memory » couvrant maintenant l’OTLP, le datawarehouse et la BI !
Encore plus de disponibilité du service et de la sécurité
Une évolution vers des plateformes hybrides grâce à la possibilité de combiner du « On Premise » et du « Cloud » dans la même architecture, permettant par la même des gains financiers
Une analyse perspicace de tous les types de données disponibles sous de nombreuses formes et permettant une restitution extrêmement pertinente à travers les outils classiques tels qu’Excel
...(read more)Image may be NSFW. Clik here to view.
Next week is TechEd, so it's off to Texas and a bunch of presentations, meetings, etc. Might get to read a few books on the flight though, so that's a win. Here's some links I found this week that were interesting.
Pour les bases de données critiques, il est indispensable de mettre en place des solutions de gestion des autorisations à la fois rigoureuse et efficace
Pour cela, une bonne compréhension des principes et des mécanismes fondamentaux de la gestion des autorisations du moteur relationnel SQL Server est indispensable.
Dans cet article, nous allons présenter de manière synthétique les différents aspects du modèle des autorisations, ainsi que les bonnes pratiques dans le cadre de leur mise en œuvre.
...(read more)Image may be NSFW. Clik here to view.
At Microsoft, we have an important program in place to work closely with our customers to ensure high-quality, real-world testing of Microsoft SQL Server before it hits the market for general availability. Internally, we call this the Technology Adoption Program (TAP). It works like this: an exclusive list of customers are invited to collaborate with us very early in the development lifecycle, and together, we figure out which features they benefit the most from testing and which workload (or scenario) they will use. They test the upgrade process, and then exploit the new feature(s), as applicable. Many of these customers end up moving their test workloads into their production environments up to six months prior to the release of the final version. The program obviously benefits Microsoft because no matter how well we test the product, it is real customer workloads that determine release quality. Our select customers benefit because they are assured that their workloads work well on the upcoming release, and they have the opportunity to work closely with the SQL Server engineering team.
Microsoft SQL Server 2014 is now generally available, and we believe you will enjoy this release for its exciting features: In-Memory OLTP; Always-On enhancements, including new hybrid capabilities; Column Store enhancements; cardinality estimate improvements, and much more. I also believe you will be happy with my favorite feature of all, and that is “reliability.” For an overview on the new features in SQL Server 2014, see the general release announcement.
To give you a better feel for this pre-release customer validation program, I will describe a few examples of customer workloads tested against SQL Server 2014 prior to the release of the product for general availability.
The first customer example is the world’s largest regulated online gaming company. Hundreds of thousands of people visit this company’s website every day, placing more than a million bets on a range of sports, casino games, and poker. SQL Server 2014 enables this customer to scale its applications to 250k requests per second, a 16x increase from the 16k requests per second on a previous version of SQL Server, using the same hardware. In fact, due to performance gains, they were able to reduce the number of servers running SQL Server from eighteen to one, simplifying the overall data infrastructure significantly. The transaction workload is session state of the online user, which not only has to manage tens of thousands of customers, it needs to respond quickly and be available at all times to ensure high customer satisfaction. The session state, written in ASP.NET, uses heavily accessed SQL Server tables that are now defined as “memory-optimized,” which is part of one of the new exciting capabilities of SQL Server 2014, In-Memory OLTP. The performance gain significantly improves the user’s experience and enables a simpler data infrastructure. No application logic changes were required in order to get this significant performance bump. This customer’s experience with SQL Server 2014 performance and reliability was so good, they went into production more than a year before we released the product.
The second customer example is a leading global provider of financial trading services, exchange technology, and market insight. Every year, the customer adds more than 500 terabytes of uncompressed data to its archives and has to perform analytics against this high volume of data. As you can imagine, this high volume of data not only costs a lot to store on disk, it can take a long time to query and maintain. To give you a sense of scale of this customer’s data volume, let me give you a few examples: one of the financial systems processes up to a billion transactions in a single trading day; a different system can process up to a million transactions per second; the data currently collected is nearly two petabytes of historical data. The cost savings on storage of 500+ terabytes of data, now compressed by ~8x using SQL Server 2014 in-memory columnstore for data warehousing indexes, provides an easy justification to upgrade, especially now that the in-memory columnstore is updatable. Significantly faster query execution is achieved due to the reduction in IO, another benefit of the updatable columnstore indexes and compressed data. This customer deployed SQL Server 2014 in a production environment for several months prior to general availability of the product.
My third example is a customer that provides data services to manufacturing and retail companies; the data services enable such companies to better market and sell more product. The closer this data services company can get to providing real-time data services, the more customers their partners can reach and the better customer satisfaction their partners can provide, when using the service. Before SQL Server 2014, the data services company designed their application utilizing cache and other techniques to ensure data (e.g., a product catalog) was readily available for customers. In this scenario, processing speed is important, and even more important than speed is data quality or “freshness,” so if the database can provide faster access to data persisted in the database rather than a copy in a cache, this ensures the data is more accurate and relevant. SQL Server 2014 In-Memory OLTP technology enables them to eliminate the application-tier cache and to scale reads and writes within the database. Data load performance improved 7x–11x. The In-Memory OLTP technology, by eliminating locking/latching, removed any lock contention that they might have previously experienced on read/write options to the database. The performance gains were so compelling, this company went into production with SQL Server 2014 four months prior to general release.
The Technology Adoption Program (TAP) is a great way to help all of us ensure that the final product has a proven high-quality track record when released. These three customers—and as many as a hundred others—have partnered with the SQL Server engineering team to ensure that SQL Server 2014 is well tested and high quality—maybe you can sleep a little better at night knowing you are NOT the first.
We are excited by the release of SQL Server 2014; check it out here.
Mark Souza General Manager Microsoft Azure Customer Advisory Team
Back from TechEd, where there were a lot of announcements from Microsoft about working with various open source things. There were so many sessions, it was impossible to attend all of them. Good thing they put the videos online.
Azure Redis Cache: There's a bunch of blogs talking about the new Redis cache and how to use from .NET, but here's a page that has some PHP examples and links to clients for various other languages. In theory, you should be able to use it from any language with a Redis client.
ASP.NET vNext: There were a lot of interesting announcements about ASP.NET & .NET in general this week at TechEd. Like, ASP.NET vNext apps running on Mono on Mac and Linux.
PostgreSQL 9.4 beta 1: I don't normally post about database updates, but PostgreSQL has some interesting stuff going on here. Specifically working with JSON data, which is one of the big pulls of things like MongoDB. But JSON as a native type, queryable with honest-to-goodness SQL and side-by-side with relational data? That's pretty compelling.
Trash: A cross-platform delete command that puts things in the trash vs. permanently deleting them.
Lorsque nous parlons de qualité de données avec les outils SQL Server, les clients choisissent en général entre des règles techniques implémentées via SSIS pour un traitement automatique ou des règles fonctionnelles implémentées via DQS (DataQuality Services) pour une gestion accessible aux utilisateurs.
Cet article est le second d’une série consacrée à l’EIM, et fait suite à l’article « Solution EIM pour Dynamics CRM », qui explique comment sur un projet en production chez un de nos clients grand compte, nous avons pu combiner les deux solutions.
...(read more)Image may be NSFW. Clik here to view.
At TechEd North America we were excited to announce our plans for EF7, and even demo some very early features. This post will cover the announcements we made during the session. You can also watch the recorded session from TechEd (the EF7 content starts at 46:40). When watching the demos please bear in mind that this is a very early preview - not all the features shown are even submitted into the main code base yet.
Entity Framework is a popular data access choice for traditional client and server applications that target the full .NET Framework. This includes applications built with technologies such as WPF, WinForms and ASP.NET. As we look to the future, we believe there is value in providing the same programming model for data access on the remaining platforms where .NET development is common. This includes Windows Store, Windows Phone and the Cloud Optimized .NET that was announced at TechEd. EF7 will work on all of these platforms as well as Mono, on both Mac and Linux.
For Windows Phone and Windows store, the initial goal is to provide local data access using EF. SQLite is the most common database of choice on devices, and will be our primary story for local data with EF7. The full provider model will be available though, so other data stores can be supported also.
In the TechEd session you can see a demo of EF7 being used in a Universal App (targeting Windows and Windows Phone) starting at 1:04:00.
EF7 Enables New Data Stores
While parts of Entity Framework are clearly tied to relational data stores, much of the functionality that EF provides is applicable to many non-relational data stores too. Examples of such functionality include change tracking, LINQ, and unit of work. In EF7 we will be enabling providers that target non-relational data stores, such as Azure Table Storage.
We are explicitly not trying to build an abstraction layer that hides the type of data store you are targeting. The common patterns/components that apply to most data stores will be handled by the core framework. Things that are specific to particular types of data stores will be available as extensions that are included as part of the provider. For example, the concept of a model builder that allows you to configure your model will be part of the core framework. However, the ability to configure things such as cascade delete on a foreign key constraint will be included as extensions in the relational database provider.
In the TechEd session you can see a demo of EF7 being used to access Azure Table Storage starting at 56:45.
Challenges
As we look at what we want to achieve in EF7 we’ve had to take a realistic look at our current code base. While it has served us well and allowed us to release some solid versions of EF in recent times, it is not really setup to achieve what we want to in EF7.
The EF code base has a long history, going back to the WinFS days, with parts of the code base being 10+ years old. The current code base makes extensive use of older APIs and design patterns. Many of the APIs that EF uses heavily are not available on the new platforms we want to target. In addition, the code base is monolithic in nature which makes it difficult to implement new features and increasingly harder to change things without breaking existing functionality.
When Entity Framework began life, it’s charter was more to do with the Entity Data Model vision rather than being a best-of-breed O/RM. As a result, there are many seldom used features and capabilities in the code base that hamper performance and complicate development, but are not feasible to remove due to the monolithic nature of the implementation. We also have a number of unintuitive behaviors that are engrained into the framework and hard to change or remove for the same reasons.
While using less resources is desirable on all platforms it becomes more prominent on devices as they typically have fewer resources than desktop and server machines. These concerns were not a driving factor when EF was designed and implemented. The current resource usage of EF hinders its ability to be a good data access offering on devices. We’ve done some investigation on lowering resource usage and found it to require architectural changes that require a very large amount of work. As we move toward cloud computing, low resource usage is also desirable as it allows greater density of applications on servers – and is often directly tied to billing.
EF7 is Lightweight and Extensible
EF7 will be a lightweight and extensible version of Entity Framework that just pulls forward the commonly used features. In addition, we’ll be able to include some commonly requested features that would have been difficult to implement in the existing code base, but can be included from the start in EF7.
We’ll be keeping the same patterns and concepts that you are used to in Entity Framework, except where there is a compelling reason to change them. You’ll see the same DbContext/DbSet based API but it will be built over building block components that are easy to replace or extend as needed – the same pattern we’ve used for some of the isolated components we’ve added in recent EF releases.
What About My EF6 Apps
We will still continue to invest in EF6 while we work to make EF7 have better feature parity with EF6. In fact, we implemented the EF6.1 and EF6.1.1 releases while working on the EF7 code base.
Upgrade from EF6 to EF7 is a key scenario for us, both in terms of existing code and existing knowledge. We’ll be keeping the same concepts and patterns wherever it makes sense. The upgrade to EF7 will require some changes to your code. Our aim is that code that uses the core functionality of the DbContext API will upgrade easily, code that makes use of the lower level APIs in EF may require more complicated changes.
Get Involved
It’s early days, but we are underway with the EF7 implementation. We’re developing in the EntityFramework GitHub repositoryand have nightly builds available. We aren’t able to accept contributions just yet, but we will in the near future.
If you want to try out the nightly builds, just bear in mind that this is pre-alpha code and there is a lot of functionality that is partially implemented or still to be added. Visit our documentation on GitHub for more information on getting started with nightly builds.
We’ll keep you updated with more details on EF7 and the features we are implementing over the coming months.
Dear Customers,
The 10 th cumulative update release for SQL Server 2012 SP1 is now available for download at the Microsoft Support site. Cumulative Update 10 contains all the SQL Server 2012 SP1 hotfixes which have been available since the initial...(read more)Image may be NSFW. Clik here to view.
In today’s world of interconnected devices and broad access to more and more data, the ability to glean ambient insight from the variety of data sources has been made quite hard by the variety and speed with which data is being delivered. Think about it for a minute, your servers continue to provide interesting data to you about the operations happening in your business but now you have data coming to you from the temperature sensors in the A/C units, the power supplies and networking equipment in the data center that can be combined to show that spikes in temperature and traffic have a dramatic effect on the life of a server. This type of contextual data is growing to include larger and with more detailed insights into the operations and management of your business. As we look to the future, Pew Research has released a report that predicts 50 billion connected devices by 2025. That is 5 devices for every person expected to be alive. With data coming from sources like manufacturing equipment to jet airliners, from mobile phones to your scale, and to things we haven’t even imaged yet the question really because how do you take advantage of all of these data sources to provide insight into the current and future trends in your business.
In April 2014, Microsoft announced the Analytics Platform System (APS) as Microsoft’s “Big Data in a Box” solution for addressing this question. APS is an appliance solution with hardware and software that is purpose built and pre-integrated to address the overwhelming variety of data while providing customers the opportunity to access this vast trove of data. The primary goal of APS is to enable the loading and querying of terabytes and even petabytes of data in a performant way using a Massively Parallel Processing version of Microsoft SQL Server (SQL Server PDW) and Microsoft’s Hadoop distribution, HDInsight, which is based off of the Hortonworks Data Platform.
Basic Design
An APS solution is comprised of three basic components:
The hardware – the servers, storage, networking and racks.
The fabric – the base software layer for operations within the appliance.
The workloads – the individual workload types offering structured and unstructured data warehousing.
Utilizing commodity servers, storage, drives and networking devices from our three hardware partners (Dell, HP, and Quanta), Microsoft is able to offer a high performance scale out data warehouse solution that can grow to very large data sets while providing redundancy of each component to ensure high availability. Starting with standard servers and JBOD (Just a Bunch Of Disks) storage arrays, APS can grow from a simple 2 node and storage solution to 60 nodes. At scale, that means a warehouse that houses 720 cores, 14 TB of RAM, 6PB of raw storage and ultra-high speed networking using Ethernet and InfiniBand networks while offering the lowest price per terabyte of any data warehouse appliance on the market (Value Prism Consulting).
The fabric layer is built using technologies from the Microsoft portfolio that enable rock solid reliability, management and monitoring without having to learn anything new. Starting with Microsoft Windows Server 2012, the appliance builds a solid foundation for each workload by providing a virtual environment based on Hyper-V that also offers high availability via Failover Clustering all managed by Active Directory. Combining this base technology with Clustered Shared Volumes (CSV) and Windows Storage Spaces, the appliance is able to offer a large and expandable base fabric for each of the workloads while reducing the cost of the appliance by not requiring specialized or proprietary hardware. Each of the components offers full redundancy to ensure high-availability in failure cases.
Workloads
Building upon the fabric layer, the current release of APS offers two distinct workload types – structure data through SQL Server Parallel Data Warehouse (PDW) and unstructured data through HDInsight (Hadoop). These workloads can be mixed within a single appliance offering flexibility to customers to tailor the appliance to the needs of their business.
SQL Server Parallel Data Warehouse is a massively parallel processing, shared nothing scale-out solution for Microsoft SQL Server that eliminates the need to ‘forklift’ additional very large and very expensive hardware into your datacenter to grow as the volume of data exhaust into your warehouse increases. Instead of having to expand from a large multi-processor and connected storage system to a massive multi-processor and SAN based solution, PDW uses the commodity hardware model with distributed execution to scale out to a wide footprint. This scale wide model for execution has been proven as a very effective and economical way to grow your workload.
HDInsight is Microsoft’s offering of Hadoop for Windows based on the Hortonworks Data Platform from Hortonworks. See the HDInsight portal for details on this technology. HDInsight is now offered as a workload on APS to allow for on premise Hadoop that is optimized for data warehouse workloads. By offering HDInsight as a workload on the appliance, the pressure to define, construct and manage a Hadoop cluster has been minimized. Any by using PolyBase, Microsoft’s SQL Server to HDFS bridge technology, customers can not only manage and monitor Hadoop through tools they are familiar with but they can for the first time use Active Directory to manage security into the data stored within Hadoop – offering the same ease of use for user management offered in SQL Server.
Massively-Parallel Processing (MPP) in SQL Server
Now that we’ve laid the groundwork for APS, let’s dive into how we load and process data at such high performance and scale. The PDW region of APS is a scale-out version of SQL Server that enables parallel query execution to occur across multiple nodes simultaneously. The effect is the ability to run what appears to be a very large operation into tasks that can be managed at a smaller scale. For example, a query against 100 billion rows in a SQL Server SMP environment would require the processing of all of the data in a single execution space. With MPP, the work is spread across many nodes to break the problem into more manageable and easier ways to execute tasks. In a four node appliance (see the picture below), each node is only asked to process roughly 25 billion rows – a much quicker task.
To accomplish such a feat, APS relies on a couple of key components to manage and move data within the appliance – a table distribution model and the Data Movement Service (DMS).
The first is the table distribution model that allows for a table to be either replicated to all nodes (used for smaller tables such as language, countries, etc.) or to be distributed across the nodes (such as a large fact table for sales orders or web clicks). By replicating small tables to each node, the appliance is able to perform join operations very quickly on a single node without having to pull all of the data to the control node for processing. By distributing large tables across the appliance, each node can process and return a smaller set of data returning only the relevant data to the control node for aggregation.
To create a table in APS that is distributed across the appliance, the user simply needs to add the key to which the table is distributed on:
CREATE TABLE [dbo].[Orders]
(
[OrderId] ...
)
WITH
(
DISTRIBUTION = HASH([OrderId])
)
This allows the appliance to split the data and place incoming data onto the appropriate node onto the appropriate node in the appliance.
The second component is the Data Movement Service (DMS) that manages the routing of data within the appliance. DMS is used in partnership with the SQL Server query (which creates the execution plan) to distribute the execution plan to each node. DMS then aggregates the results back to the control node of the appliance which can perform any final execution before returning the results to the caller. DMS is essentially the traffic cop within APS that enables queries to be executed and data moved within the appliance across 2-60 nodes.
Performance
With the introduction of Clustered Column Indexes (CCI) in SQL Server, APS is able to take advantage of the performance gains to better process and store data within the appliance. In typical data warehouse workloads, we commonly see very wide table designs to eliminate the need to join tables at scale (to improve performance). The use of Clustered Column Indexes allows SQL Server to store data in columnar format versus row format. This approach enables queries that don’t utilize all of the columns of a table to more efficiently retrieve the data from memory or disk for processing – increasing performance.
By combining CCI tables with parallel processing and the fast processing power and storage systems of the appliance, customers are able to improve overall query performance and data compression quite significantly versus a traditional single server data warehouse. Often times, this means reductions in query execution times from many hours to a few minutes or even seconds. The net results is that companies are able to take advantage of the exhaust of structured or non-structured data at real or near real-time to empower better business decisions.
Today we are pleased to announce the availability of EF6.1.1 Beta 1. This patch release includes a number of high priority bug fixes and some contributions from our community.
This is a preview of changes that will be available in the final release of EF6.1.1 and is designed to allow you to try out the new features and report any issues you encounter. Microsoft does not guarantee any level of support on this release.
All the changes we plan to include in the final release of EF6.1.1 are included in this Beta. If additional high priority bugs are reported on the Beta we will consider fixing them prior to the final release. Depending on the bug reports we get, we may provide another preview before the final release.
As a global IT leader, Dell manufactures some of the world’s most innovative hardware and software solutions. It also manages one of the most successful e-commerce sites. In 2013, the company facilitated billions in online sales. On a typical day, 10,000 people are browsing Dell.com at the same time. During peak online shopping periods, the number of concurrent shoppers can increase 100 times, to as many as one million people.
To help facilitate fast, frustration-free shopping despite traffic spikes, Dell has distributed the website’s online transaction processing (OLTP) load between 2,000 virtual machines, which include 27 mission-critical databases that run on Microsoft SQL Server 2012 Enterprise software and the Windows Server 2012 operating system. These databases, along with hundreds of web applications, are supported by Dell PowerEdge servers, Dell Compellent storage, and Dell Networking switches.
When Dell learned about SQL Server 2014 and its in-memory capabilities, the company immediately signed up to be an early adopter. Not only are memory-optimized tables in SQL Server 2014 lock-free—making it possible for numerous applications to simultaneously access and write to the same database rows—but also the solution is based on the technologies that IT staff already know how to use.
Initially, engineers set up the database tables to be fully durable, meaning that the table replicas are synchronous copies. However, developers can also configure the tables to use delayed durability, which means that changes made to a table’s replica are delayed slightly to minimize any impact on performance.
By gaining the option to store tables in memory, Dell is achieving unprecedented OLTP speeds. “The performance increase we realize with In-Memory OLTP in SQL Server 2014 is astounding!” says Scott Hilleque, Design Architect at Dell. “After just a few hours of work, groups sped database performance by as much as nine times. And all aspects of our In-Memory OLTP experience has been seamless for our staff because it is so easy to adopt, and its implementation produces zero friction for architects, developers, database administrators, and operations staff.”
Although Dell is in the very early stages of adopting SQL Server 2014, IT workers are excited by the impact of In-Memory OLTP. The more the IT team can speed database performance, the faster web applications can get the information that they need to deliver a responsive and customized browsing experience for customers. Reinaldo Kibel, Database Strategist at Dell summarizes that “In-Memory OLTP in SQL Server 2014 really signifies a new mindset in database development because with it, we no longer have to deal with the performance hits caused by database locks—and this is just one of the amazing benefits of this solution.”
You can read the full case study here and watch the video here:
Also, check out the website to learn more about SQL Server 2014 and start a free trial today.
Edgenet provides optimized product data for suppliers, retailers and search engines. Used online and in stores, Edgenet solutions ensure that businesses and consumers can make purchasing and inventory decisions based on accurate product information. Last year, it implemented an In-Memory OLTP solution built on SQL Server 2014, which has helped it continue to innovate and lead in its business
We caught up with Michael Steineke, Vice President of IT at Edgenet, to discuss the benefits he has seen since Edgenet implemented SQL Server 2014.
Q: Can you give us a quick overview of what Edgenet does?
A: We develop software that helps retailers sell products in the home building and automotive industries. We work with both large and small dealers and provide software that helps determine and compare which products are in a local store.
We provide the specs, pictures, manuals, diagrams, and all the rest of the information that a customer would need to make an informed decision. We take all of this data, standardize it, and provide it to retailers and search engines.
With the major shift to online sales over the past handful of years, retailers need to have relevant and timely product information available so the customer can compare products and buy the best one for their needs.
In a single store, inventory is easy. In a chain where you have 1,000 or 5,000 stores, that gets very complicated. Our company is built on product data, and we need a powerful solution to manage it.
Q: What is your technology solution?
A: We are using In-Memory OLTP based on SQL Server 2014 to power our inventory search. This is where SQL Server 2014 comes in. Our applications make sure we have the right product listed, pricing and availability, and we couldn’t do it without In-Memory OLTP.
Q: What types of benefits have you seen since deployment?
A: SQL Server 2014 and OLTP have helped change our business. Our clients are happy as well. No matter what our customers need, we can do it with our solution. If a retailer wants to supply the data to us every 10 minutes, we can update every 10 minutes. It’s the way we like to do our business.
Q: Why did you choose to deploy SQL 2014 in your organization?
A: Working with Microsoft was a natural choice since we often are early adopters with new technologies. Our goal is to utilize new feature sets of new software as much as possible so we stay innovators in the field. That was the main reason we were so excited to deploy the In-Memory OLTP features with SQL Server 2014.
Q: What type of data are you managing?
A: Our inventory data isn’t extremely large, but there is a lot of volatility with it. We are talking about managing thousands of products across thousands of stores, with different pricing and availability for each store. There could be hundreds of millions of rows for just one retailer. Our big data implementation is around managing this volatility in the market, and we need a powerful back-end solution to help us handle all sorts of information.
Q: What are the advantages of In-Memory OLTP?
A: The biggest advantage we are getting is the ability to continually keep data up-to-date, so we always have real-time inventory and pricing. While we are updating we can continue to use the same tables, with little or no impact on performance. We were also able to consolidate a database used for the application to read that was refreshed daily and a database that consumed the updates from our customers, to one In-Memory database.
For more information about SQL Server 2014, check out the website and start a free trial today.
Virginia Tech is using the Microsoft Azure Cloud to create cloud-based tools to assist with medical breakthroughs via next-generation sequence (NGS) analysis. This NGS analysis requires both big computing and big data resources. A team of computer scientists at Virginia Tech is addressing this challenge by developing an on-demand, cloud-computing model using the Azure HDInsight Service. By moving to an on-demand cloud computing model, researchers will now have easier, more cost-effective access to DNA sequencing tools and resources, which could lead to even faster, more exciting advancements in medical research.
We caught up with Wu Feng, Professor in the Department of Computer Science and Department of Electrical & Computer Engineering and the Health Sciences at Virginia Tech, to discuss the benefits he is seeing with cloud computing.
Q: What is the main goal of your work?
We are working on accelerating our ability to use computing to assist in the discovery of medical breakthroughs, including the holy grain of “computing a cure” for cancer. While we are just one piece of a giant pipeline in this research, we seek to use computing to more rapidly understand where cancer starts in the DNA. If we could identify where and when mutations are occurring, it could provide an indication of which pathways may be responsible for the cancer and could, in turn, help identify targets to help cure the cancer. It’s like finding a “needle in a haystack,” but in this case we are searching through massive amounts of genomic data to try to find these “needles” and how they connect and relate to each other “within the haystack.”
Q: What are some ways technology is helping you?
We want to enable the scientists, engineers, physicists and geneticists and equip them with tools so they can focus on their craft and not on the computing. There are many interesting computing and big data questions that we can help them with, along this journey of discovery.
Q: Why is cloud computing with Microsoft so important to you?
The cloud can accelerate discovery and innovation by computing answers faster, particularly when you don’t have bountiful computing resources at your disposal. It enables people to compute on data sets that they might not have otherwise tried because they didn’t have ready access to such resources.
For any institution, whether a company, government lab or university, the cost of creating or updating datacenter infrastructure, such as the building, the power and cooling, and the raised floors, just so a small group of people can use the resource, can outweigh the benefits. Having a cloud environment with Microsoft allows us to leverage the economies of scale to aggregate computational horsepower on demand and give users the ability to compute big data, while not having to incur the institutional overhead of personally housing, operating and maintaining such a facility.
Q: Do you see similar applications for businesses?
Just as the Internet leveled the playing field and served as a renaissance for small businesses, particularly those involved with e-commerce, so will the cloud. By commoditizing “big data” analytics in the cloud, small businesses will be able to intelligently mine data to extract insight with activities, such as supply-chain economics and personalized marketing and advertising.
Furthermore, quantitative analytic tools, such as Excel DataScope in the cloud, can enable financial advisors to accelerate data-driven decision-making via commoditized financial analytics and prediction. Specifically, Excel DataScope delivers data analytics, machine learning and information visualization to the Microsoft Azure Cloud.
In any case, just like in the life sciences, these financial entities have their own sources of data deluge. One example is trades and quotes (TAQ), where the amount of financial information is also increasing exponentially. Unfortunately, to make the analytics process on the TAQ data a more tractable one, the data is often triaged into summary format and thus could potentially and inadvertently filter out critical data that should not have been.
Q: Are you saving money or time or experiencing other benefits?
Back when we first thought of this approach, we were wondering if it would even a feasible solution for the cloud. For example, with so much data to upload to the cloud, would the cost of transferring data from the client to the cloud outweigh the benefits of computing in the cloud? With our cloud-enabling of a popular genome analysis pipeline, combined with our synergistic co-design of the algorithms, software, and hardware in the genome analysis pipeline, we realized about a three-fold speed-up over the traditional client-based solution.
Q: What does the future look like?
There is big business in computing technology, whether it is explicit, as in the case of personal computers and laptops, or implicit, as in the case of smartphones, TVs or automobiles. Just look how far we have come over the past seven years with mobile devices. However, the real business isn’t in the devices themselves, it’s in the ecosystem and content that supports these devices: the electronic commerce that happens behind the scenes. In another five years, I foresee the same thing happening with cloud computing. It will become a democratized resource for the masses. It will get to the point where it will be just as easy to use storage in the cloud as it will be to flip a light switch; we won’t think twice about it. The future of computing and data lies in the cloud, and I’m excited to be there as it happens.
For more information about Azure HDInsight, check out the website and start a free trial today.
Dear Customers,
We are planning to ship one last Service Pack for both SQL Server 2008 and SQL Server 2008 R2.
Because of the maturity of SQL Server 2008 and 2008 R2, these Service Pack(s) will be an exception
in terms of timing and will ship...(read more)Image may be NSFW. Clik here to view.
La mise en oeuvre de solutions métiers dans les entreprises requiert un alignement avec les défis posés par le monde de affaires actuel : une disponibilité et une accessibilité maximale, des pics de montée en charge, une croissance des volumes de données...
La version 2014 de SQL Server expose une gamme de fonctionnalités de cloud Hybride permettant de répondre à ces défis, notamment à travers des solutions de haute disponibilité flexibles et de protection de données basées sur Windows Azure.
...(read more)Image may be NSFW. Clik here to view.
Building your first Grunt plugin: If you're using Grunt, there will come a time when you need to write your own plugin. Might as well learn how to do it now.
Microsoft and SalesForce partnership: This is one of those things that touches a lot of things. Devices, Office, OneDrive, etc. It'll be interesting to see everything that comes out of this.
This blog post will highlight PolyBase’s truly unique approach focusing on:
Query capabilities across various heterogeneous data sources – on-premises and cloud (Microsoft Azure) bringing Microsoft data services together forming one complete data platform solution
Total freedom for users with no lockdown, agnostic to the actual Hadoop distribution and/or underlying operating system
Faster insights from all your data in a simple and performing fashion allowing users to leverage their existing tools and SQL scripts
1. Bringing the relational world together with Hadoop & Cloud (Azure)
In the very recent past, various SQL over Hadoop/HDFS solutions have been developed, such as Impala, HAWQ, Stinger, SQL-H, Hadapt to name just a few. While there are clear technical differences between the various solutions, at a high level, they are similar in offering a SQL-like front end over data stored in HDFS.
So, is PolyBase yet another similar solution competing with these approaches? The answer is yes and no. On first glance, PolyBase is a T-SQL front end that allows customers to query data stored in HDFS. However, with the recently announced Analytics Platform System (APS), we have updated PolyBase with new syntax to highlight our extensible approach. With PolyBase, we bring various Microsoft data management services together and allow appliance users to leverage a variety of Azure services. This enables a new class of hybrid scenarios and reflects the evolution of PolyBase to a true multi-data source query engine. It allows users to query their big data – regardless of whether it is stored in an on-premises Hadoop/HDFS cluster, Azure storage, Parallel Data Warehouse, and other relational DBMS systems (offered in a future PolyBase release).
Complete Data Platform with PolyBase as key integrative component
2. Freedom of Choice
Openness
One important key differentiator of PolyBase compared to all of the existing competitive approaches is ‘openness’. We do not force users to decide on a single solution, like some Hadoop providers are pursuing. With PolyBase, you have the freedom to use an HDInsight region as a part of your APS appliance, to query an external Hadoop cluster connected to APS, or to leverage Azure services from your APS appliance (such as HDInsight on Azure).
To achieve this openness, PolyBase offers these three building blocks.
A user can now create statistics for each of the external tables shown above to improve the query performance. We extended SQL Server’s mature stats framework to work against external tables in the same way it works against regular tables. Statistics are crucial for the PolyBase query engine in order to generate optimal execution plans and to decide when pushing computation into the external data source is beneficial.
Performance
While other SQL over Hadoop solutions (e.g. Impala, Stinger, and HAWQ) have improved, it remains true that they still cannot match the query performance of a mature relational MPP system. With PolyBase, the user can import data in a very simple fashion into PDW (through a CTAS statement, see below), use the fast SQL Server column store technology along with the MPP architecture, or let the PDW/PolyBase query optimizer decide which parts of the query get executed in Hadoop and which parts in PDW. This optimized querying, called split-based query processing, allows parts of the query to be executed as Hadoop MR jobs that are generated on-the-fly completely transparent for the end user. Thereby, the PolyBase query optimizer takes into account parameters such as the spin-up time for MR jobs and the generated statistics to determine the optimal query plan.
In general, if it comes to performance the answer usually is ‘it depends on the actual use case/query’. With PolyBase, the user has total freedom and can leverage capabilities of PDW and/or Hadoop based on their actual needs and application requirements.
PolyBase in APS bridging the gap between the relational world, Hadoop (external or internal) and Azure
The T-SQL statement below will run across all data sources combining structured appliance data with un/semi-structured data in external Hadoop, internal HDInsight region, and Azure (e.g. historical data) –
SELECT machine_name, machine.location FROM Machine_Information_PDW, Old_SensorData_Azure, SensorData_HDI, SensorData_ExternalHDP
WHERE Machine_Information_PDW.MachineKey = Old_SensorData_Azure.MachineKey
and Machine_Information_PDW.MachineKey = SensorData_HDI.MachineKey
and Machine_Information_PDW.MachineKey = SensorData_ExternalHDP.MachineKey
and SensorData_HDI.Temperature> 80 and Old_SensorData_Azure.Temperature > 80
and SensorData_ExternalHDP.Temperature > 80
This query example shows how simplicity and performance are combined at the same time. It shows three external tables referring to three different locations plus one regular (distributed) PDW table. While executing the query, the PolyBase/PDW query engine will decide, based on the statistics, whether or not to push computation to the external data source (i.e. Hadoop).
Rewriting & Migrating existing applications
Finally, you may have heard that Hadoop is ‘cheaper’ than more mature MPP DBMS systems. However, what you might not have heard about is the cost associated with rewriting existing applications and ensuring continued tool support. This goes beyond simple demos showing that tool ‘xyz’ works on top of Hadoop/HDFS.
PolyBase does not require you to download and install different drivers. The beauty of our approach is that external tables appear like regular tables in your tool of choice. The information about the external data sources and file formats is abstracted away. Many Hadoop-only solutions are not fully SQL-ANSI compliant and do not support various SQL constructs. With PolyBase, however, you don’t need to rewrite your apps because it uses T-SQL and preserves its semantics. This is specifically relevant when users are coming from a ‘non-Java/non-Hadoop world’. You can explore and visualize your data sets either by using the Microsoft BI solutions (initiated on-premises or through corresponding Azure services) or by using the visualization tool of your choice. PolyBase keeps the user experience the same.
3. Simplified ETL & Fast Insights
It’s already a painful reality that many enterprises store and maintain data in different systems that are optimized for different workloads and applications, respectively. Admins are spending much time moving, organizing, and keeping data in sync. This reality imposes another key challenge which we are address with PolyBase – in addition to querying data in external data sources, a user can achieve a simpler and more performant ETL (extraction, transformation, loading). Different than existing connector technologies, such as SQOOP, a PolyBase user can use T-SQL statements to either import data from external data sources (CTAS) or export data to external data sources (CETAS).
T-SQL CETAS statement to age out Hadoop & PDW data to Azure
CREATE EXTERNAL TABLE Old_Data_2008_Azure
WITH (LOCATION='//Sensor_Data/2008/sensordata.tbl', DATA_SOURCE=Azure_DS, FILE_FORMAT=DelimText2)
AS SELECT T1.* FROM Machine_Information_PDW T1 JOIN SensorData_ExternalHDP T2
ON (T1.MachineKey = T2.MachineKey) WHERE T2.YearMeasured = 2008
Combines data from external Hadoop and PDW sources and stores the results in Azure
Under-the-covers, the PolyBase query engine is not only leveraging the parallelism of an MPP system, it also pushes computation to the external data source to reduce the data volume that needs to be moved. The entire procedure remains totally transparent for the user while ensuring a very fast import & export of data that greatly outperforms any connector technology offered today. With the CTAS statement, a user can import data into the relational PDW region where it stores the data as column store. This way, users can immediately leverage the column store technology in APS without any further action.
T-SQL CTAS statement for importing Hadoop data into PDW
CREATE TABLE Hot_Machines_2011 WITH (Distribution = hash(MachineKey),
CLUSTERED COLUMNSTORE INDEX)
AS SELECT * FROM SensorData_HDI where SensorData_HDI.YearMeasured = 2011 and SensorData_HDI.Temperature > 150
Combines PolyBase with column store – Imports data from Hadoop into PDW CCI tables
In summary, PolyBase is more than just another T-SQL front end over Hadoop. It has evolved into a key integrative component that allows users to query, in a simple fashion, data stored in heterogeneous data stores. There is no need to maintain separate import/export utilities. PolyBase ensures great performance by leveraging the computation power available in external data sources. Finally, the user has freedom in almost every dimension whether it’s about tuning the system and getting the best performance, choosing their tools of choice to derive valuable insights, and to leverage data assets stored both on-premises and within the Azure data platform.
Watch how APS seamlessly integrates data of all sizes and types here
Hadoop Summit kicked of today in San Jose, and T. K. Rengarajan, Microsoft Corporate Vice President of Data Platform, delivered a keynote presentation where he shared Microsoft’s approach to big data and the work we are doing to make Hadoop accessible in the cloud. At the event, we also announced that Azure HDInsight, our Hadoop-based service in the cloud, now supports Hadoop 2.4.
Investing in Hadoop Hadoop is a cornerstone to our approach of making data work for everyone. As part of this bet we have fully embraced the Hadoop ecosystem and have prioritized contributing back to the community and Apache Hadoop-related projects e.g. Tez, Stinger and Hive. All told, we’ve contributed 30,000 lines of code and put in 10,000+ engineering hours to support these projects, including the porting of Hadoop to Windows. We’ve done this in partnership with Hortonworks, a relationship that ensures our Hadoop solutions are based on compatible implementations of Hadoop. One of the results of that partnership is the engineering work that has led to the Hortonworks Data Platform for Windows and Azure HDInsight.
Azure HDInsight The massive scale, power, elasticity and low cost of storage, makes the cloud the best place to deploy Hadoop. That’s one of the reasons we have invested heavily in our cloud-based Hadoop solution, Azure HDInsight, which combines the best of open source with the flexibility of cloud deployment. It’s also integrated with our business intelligence tools, enabling easy access and transformation of data from HDInsight to Excel and Power BI for Office 365.
Today we are providing an update to Azure HDInsight with support for Hadoop 2.4, the latest version of Hadoop. This update includes interactive querying with Hive using advancements based on SQL Server technology, which we are also contributing back to the Hadoop ecosystem through project Stinger. With this update to HDInsight, customers can use the speed and scale of the cloud to gain a 100x response time improvement.
HDInsight is just one part of our comprehensive data platform, which includes the building blocks customers need to process data anywhere it lives and in the format where it is born, whether they use Microsoft Intelligent Systems Service to capture machine-generated data within the Internet of Things, SQL Server or Azure SQL Database to store and retrieve data, Azure HDInsight to deploy and provision Hadoop clusters in the cloud, or Excel and Power BI for Office 365 to analyze and visualize data.
Visual Studio 14 CTP was made available today, see the announcement post on the Visual Studio blog for more details. This post covers the places that Entity Framework is included in the release and some limitations to be aware of when using it.
Entity Framework Tools
As with past versions of Visual Studio, the Entity Framework Tools are included in-the-box. These tools are capable of working with models created with all versions of Entity Framework up to and including EF6.x.
Visual Studio 14 CTP includes an older build of the EF6 tooling which does not include the bug fixes and improvements from the 6.1.0 and 6.1.1 releases. Future previews of Visual Studio 14 will be updated to the latest version of the tooling.
At this stage, there isn’t a version of the 6.1.0 or 6.1.1 tooling that can be installed on Visual Studio 14.
Entity Framework 6 Runtime
An older build of the EF6 runtime is included in a number of places in Visual Studio 14 CTP. This build does not include the bug fixes and improvements from the 6.1.0 and 6.1.1 releases.
The runtime will be installed if you create a new model using the Entity Framework Tools in a project that does not already have the EF runtime installed.
The runtime is pre-installed in new ASP.NET projects, depending on the project template you select.
We recommend using NuGet to update to the latest version of the runtime. At the time of writing 6.1.0 was the latest stable release and 6.1.1-beta1 was available. For detailed information on how to upgrade, see Updating a Package in the NuGet documentation.
Entity Framework 7
We recently blogged about our plans for Entity Framework 7. Visual Studio 14 CTP includes an early preview of ASP.NET vNext, which in turn includes a very early build of EF7. The EF7 runtime is installed in new ASP vNext projects that are created.
This build of EF7 only implements very basic functionality and there are a number of limitations with the features that are implemented. Please bear in mind that this preview is designed to give you an idea of what the experience will be like and you will quickly hit limitations if you deviate from the code from the default project template.
The EF7 code base is still in the very early stages of development, but if you want to experiment with a build we would recommend visiting our GitHub wiki for information on using nightly builds. Just remember that there are lots of things that don’t work… seriously… we warned you :)!