Quantcast
Channel: Data Platform
Viewing all 808 articles
Browse latest View live

How to use OData Client Code Generator to generate client-side proxy class

$
0
0

In this tutorial, you will generate an OData client proxy class for an OData V4 service by “OData Client Code Generator”.

Install OData Client Code Generator

Start Visual Studio, from the TOOLS menu, select Extensions and Updates

In the left panel, expand Online -> Visual Studio Gallery. Search “OData Client Code Generator” in search box and Download the VSIX.

clip_image002

After the VSIX is downloaded, the installer page will show. Click Install.

 clip_image004

Following dialog will show when installation finish. Click Close.

clip_image005

You need to restart Visual studio in order for the changes to take effect.

Create a Your Application

Create your project. OData Client Code Generator works with any projects, but here, we take “Console Application” project for example

clip_image007

Add OData Client Proxy File

In Solution Explorer, right click you project name. Click Add->New Item.

clip_image009

In the left panel, expand Installed->Visual C# Items->Code. Choose “OData Client”, and rename your client file, such as “NorthwindProxy.tt”, and Click Add.

clip_image011

This step will add two files to your project

1. NorthwindProxy.odata.config contains all the configurations you need to set:

· MetadataDocumentUri

Service document URI or service metadata URI or a file path of metadata local copy.

· NamespacePrefix

o The value will be used as the namespace of your client proxy when metadata contains only one schema.

o The value will be added before namespaces of schemas when metadata contains several schemas.

o The namespace in metadata will be used if the value is empty string.

2. NorthwindProxy.tt is the T4 template for generating proxy file from metadata document.

Save your configuration file.

clip_image013

Re-trigger the code generation by right click at NorthwindProxy.tt, and select “Run Custom Tool”. The generated cs file will show in Solution Explorer under the tt file.

clip_image014

clip_image016

Consume the generated code

Now, we have generated the client proxy for the service file. You can refer to it in your client code. Example.

using System;using System.Linq;using ODataClientCodeGeneratorSample.ODataWebExperimental.Northwind.Model;namespace ODataClientCodeGeneratorSample
{class Program
    {staticvoid Main(string[] args)
        {
            NorthwindEntities dc = new NorthwindEntities(new Uri("http://services.odata.org/V4/Northwind/Northwind.svc/"));
            var products = dc.Products.ToList();

            Console.WriteLine(products.Count);
        }
    }
}

Leeds Teaching Hospitals Transform Healthcare Using Big Data Solution to Provide Faster Access to Information

$
0
0

Since the launch of the Windows Azure HDInsight Service we’ve seen many companies using Big Data solutions based on HDInsight to gain benefits from analyzing unstructured data within their organization – a good summary of some of these organizations was recently showcased by CIO Magazine.

One of the early adopters of HDInsight was Microsoft Partner, Ascribe, a leading provider business intelligence (BI) clinically focused IT solutions and services for the healthcare industry.  Ascribe wanted to create not just a proof-of-concept BI solution for monitoring infectious disease on the national level, but also a tool that could be used to improve operations for local care providers. “Our goal was to find a way to make data flow more quickly in near-real time,” says Paul Henderson, BI Division Head at Ascribe.

The company asked Leeds Teaching Hospitals, one of the biggest NHS trusts in the UK, to participate in the project. Leeds can generate up to half a million structured records each year in its Emergency Department system. The hospital also generates approximately 1 million unstructured case files each month.

Ascribe decided to implement a hybrid cloud solution based on Microsoft SQL Server 2012 Enterprise software running on-premises and Windows Azure HDInsight Service running on the Windows Azure platform. Ascribe planned to take advantage of BI tools built into SQL Server 2012, as well as multiple other products and services.

By using natural-language processing and patient-level data in the pilot project, the team identified 30 distinct scenarios where the hospital would be able to improve operations and reduce costs. “There are huge efficiency savings to be had from taking slates to a bedside and updating the patient record there,” says Andy Webster, Lead Clinician, Emergency Department at Leeds Teaching Hospitals.

Leeds looks forward to transforming its operations in several scenarios. For example, the hospital can now look at case notes and identify costly services such as tests that were not recorded in clinical databases. Additionally, by using analytics capabilities and self-service BI tools, healthcare providers can respond faster and collaborate more easily with peers. “If you’re looking at the spread of infectious disease on the national level, you want to figure out what is happening as quickly as possible,” says Henderson. 

By using a distributed platform to aggregate and process information from disparate sources such as emergency room notes, school attendance records, and social media, Ascribe is unlocking the potential of huge volumes of data.  You can learn more about the Leeds Teaching Hospital solution by reading the more detailed case study here.  

You can also read the announcement about Windows Azure HDInsight supporting preview clusters of Hadoop 2.2, and read how to do it here or watch the video here.

HDInsight leverages Microsoft’s partnership with Hortonworks.  View a Hortonworks-Microsoft webinar on the Hybrid Modern Data Architecture , and read more about Microsoft’s involvement with Hadoop through Stinger and partnering with Hortonworks. 

Innovation and the Data Platform Continuum

$
0
0

All of us who work in technology-related fields have experienced disruptions that seem to change everything, and figuring out how to manage through the transition can mean the survival or demise of an enterprise. When faced with innovation, it is critical to determine when, where, and how to adopt the change.

At Microsoft, we talk a lot about the shift to cloud computing.  As we continue to invest in this shift, we are developing and delivering outstanding on-premises data platform solutions, capabilities to bridge between the cloud and on-premises, and advancing cloud technologies for next generation applications.

As examples, consider:

  • The breakthrough performance of In-Memory OLTP and In-Memory DW in SQL Server 2014.
  • Hybrid scenarios like enrolling an Azure-hosted virtual machine running SQL Server with an on-premises Availability Group or high-performing appliances running queries or moving data across both on-premises and cloud environments.
  • Completely new ways for developers to write mission critical applications using services like Azure SQL Database, with built-in high availability, performance, and business continuity.

At Microsoft, I lead the program management team responsible for data platform technologies such as SQL Server, Parallel Data Warehouse, Azure SQL Virtual Machines, and Azure SQL Database. I’ve spent nearly my entire professional career building enterprise—and now cloud—software.

While on this journey, I have had the privilege of talking to customers around the world, to be part of major technology changes, to make some pretty big mistakes, and recently to go through one of the largest engineering culture changes in my career. The shift from building box-cycle software to a cloud computing cadence has been exhilarating, to say the least.

In this post, I hope to share some insight into how we think about our data platform and what’s to come.

PLATFORM CONTINUUM

The North Star that guides our work on the modern data platform is to deliver products and integrated services that empower developers to build high performance applications that are flexible and scale free. We talk about the data platform as a continuum of capabilities, spanning on-premises databases, hybrid or cloud-attached solutions, and Azure assets that each provide a unique set of capabilities.

The concept of a continuum of capabilities enables developers to continue to use SQL Server on-premises, to easily virtualize and move database workloads into Azure, and to attach Azure services and build new cloud applications all from one data platform.

 WHY IT’S IMPORTANT & WHAT’S TO COME

The Microsoft data platform addresses a broader set of scenarios than anyone else because it encompasses solutions for on-premises, hybrid (cloud-attached), and full cloud data services. In fact, you could think of this data platform continuum as the New Tier1 for modern mission critical applications.

On-premises—While we anticipate massive growth in the use of cloud computing, it is also true that there will always be opportunities for on-premises, highly regulated database workloads that might never move to the cloud. To support these workloads, we continue to embrace hardware trends that cater to very high-scale applications and deliver strategic capabilities which maximize performance and availability via on-premises SQL Server databases.

I’ve met many developers and IT professionals who can explain in detail every aspect of their mission critical OLTP applications. It’s both impressive and demanding. They require full environmental control, with good reason. At the same time, these customers and others are exploring new and more efficient ways of doing things.

Hybrid—I have worked with many customers who have spent months designing a disaster recovery strategy and then additional months and large expense just trying to realize it. Today, many on-premises SQL Server customers have discovered new ways of planning for disaster recovery by way of the cloud. One example is by enrolling an Azure-hosted virtual machine running SQL Server with an on-premises SQL Server Availability Group. Another example is configuring on-premises SQL Server to encrypt and continuously backup to Azure data management services. Both scenarios tap into a use of Windows Azure, which provides an infinite level of scale and geo redundancy for a fraction of the cost.

A function like SQL Server <-> Windows Azure backup is going to reap the rewards of continuous reduced storage costs in the cloud and maintain the benefits of mission critical business continuity for pennies on the dollar. I expect an extraordinary amount of innovation to happen in the hybrid space. Frankly, what’s mentioned here is just the tip of what’s possible.

Mission Critical—Lately, I’ve been thinking about how customers talk about their applications as “mission critical.” In the past, truly mission critical applications were a small but very important set of applications, usually constrained by cost in terms of development expense and talent. Going forward, I expect a much larger number of mission critical applications will emerge because the economics and technology advances can allow the highest service-level-agreements at price points that were unimaginable in the past.

Virtualization—An extraordinary number of companies have virtualized their environments, lifted and shifted applications to the cloud, and built development and test environments that spin up and down in Azure Virtual Machines. We’re talking millions of compute hours and platform telemetry providing insight unlike ever before. Imagine petabytes of telemetry from a global footprint of virtual machines powering millions of applications that humans and machines interact with to understand usage patterns or predict failures, etc.

Big Data—If you believe that almost every vital object on earth will one day emit data to be captured, stored, and analyzed—and many already are—then, you know how many breakthroughs are really just waiting to be discovered in the world. Think about what a medical researcher with a $5,000 grant could actually do several years ago compared to what they can do today, and then imagine tomorrow. Wow! I imagine this researcher driving into work with a new hypotheses, after she was up all night pondering about how to cure a disease, and being able to ask a question that thousands of computers then process and answer (at fractional cost). The world of medicine could be forever changed by enabling researchers to do things that were just not feasible in the past. Or, imagine what an automobile manufacturer could learn from every single car it built emitting telemetry. This could result in smarter, safer, and more efficient cars.

Appliances—Another important data platform asset is Microsoft Parallel Data Warehouse (PDW). It’s a mission critical, pre-built data warehouse appliance that enables analytics across structured and unstructured data at massive scale (up to 10 PB) with 10-100X performance gains relative to SMP DW w/SQL Server. PDW includes a built-in HDI Insight Region and Polybase, a remarkable innovation because it enables new classes of hybrid insights. With a focus on scenarios that start on-premises, Polybase enables queries to run across Azure SQL Database and HDInsight data stores in the cloud and on-premises using TSQL. In both on-premises and hybrid scenarios, existing BI tools work transparently out of the box.

Cloud Computing—I love what our customers are doing with Azure SQL Database. They are redefining what it means to build a data-centric application in the cloud. To me, the developer promise of Platform as a Service solutions like SQL Database means that all applications can be Tier1-enabled and achieve the highest service-level-agreements (SLAs) across the Azure platform without IT effort. This is major step forward as most data applications require both IT professionals and developers to design, build, and operate the entire system. With SQL Database, a developer no longer needs to think about designing availability, performance, and business continuity into the system because it’s built-in and driven by SLAs for the platform. And, going forward, developers will no longer spend time elaborating over scale concerns because the platform includes a simple contract to elastically scale applications without rewrites and extraordinary burden.

SUMMARY

Looking forward, I’m more excited about the future of data platform technology than I’ve ever been because we are seeing breakthroughs in every industry segment: manufacturing, finance, healthcare, entertainment, agriculture, etc., and maybe these breakthroughs result in a better world for generations to come. I’ve never seen anything like this in my career, and it’s humbling to know that we all have an opportunity to be part of it.

We believe the continuum of on-premises, hybrid or cloud-attach, and Windows Azure assets provides a unique connection of elements that customers won’t find anywhere but Microsoft and enables the incredible ideas and aspirations of generations to come.

For more information, see SQL Server, Windows Azure Data Services, Parallel Data Warehouse (PDW).

Shawn Bice
Director of Program Management
Data Platform Group
Microsoft Corp.

Migrations Screencast Series

$
0
0

We recently published a three part series on using Entity Framework Code First Migrations. The short series is designed to help you use migrations in the ‘real world’ where you often work in a development team and/or have to write applications that interact with existing database schemas.

The screencasts cover some concepts using diagrams and then show these concepts in action with a demo in Visual Studio.

UnderTheHood

Migrations – Under the Hood

This screencast digs into how migrations works, including how it detects changes to the model by storing model snapshots in each migration.

This is useful for all developers to know, but also a pre-requisite to understanding the concepts in the other two screencasts.

TeamEnvironments

Migrations – Team Environments

This is the companion screencast to our Code First Migrations in Team Environments documentation.

This screencast covers using Code First Migrations in a development team, where two or more developers are working on the same project, using a source control system.

ExistingDatabases

Migrations – Existing Databases

This is the companion screencast to our Code First Migrations with an existing database documentation.

This screencast covers how to enable migrations for an existing database and then use migrations to propagate changes in your Code First model to the database.

Containment is Coming with OData V4

$
0
0

Before OData V4, every entity set must be accessed from top-level even it is in a containment relationship with other set. For example, assume a schema where Order Lines (Type: “OrderLine”, EntitySet: “OrderLines”) live within Orders (Type: “Order”, EntitySet: “Orders”). Then, Orders have an “id” member that is the key, and that OrderLines has id and OrderId (container id) members that form a composite key. There are two issues in this kind of model.

  • Deep addressing through container has redundant information, e.g. /Orders(1)/OrderLines(1,2)  (the container id, “1”, is repeated)
  • Entities are always addressable through top-level entity sets, e.g. /OrderLines(1,2)

To solve the above problems, containment navigation property (“containment” in short) is introduced in OData V4 (containment spec). Using containment, you can define an implicit entity set for each instance of its declaring entity type. The implicit set cannot be accessed from top-level. The redundant information issue will be resolved, too.

The following part will introduce how to consume containment. (Sample Code)

1. EdmModel Definition

The containment navigation property should be defined it in EdmMode before consuming. There are two difference between the definitions of containment navigation property and non-containment navigation property.

  • The “ContainsTarget” attribute is set to true when it is containment. If no value is assigned to the “ContainsTarget” attribute, the attribute defaults to false.
  • No need to call Edm API AddNavigationTarget to set the target set of containment navigation property. Since, an entity cannot both belong to an entity set declared within the entity container and be referenced by a containment relationship.

The following code defined a containment navigation property “ShoppingCart” of Customer, which type is “Product”.

 

In CSDL, the navigation property will be represented as:

2. Uri Parser

For containment navigation property, the segment will be parsed as NavigationPropertySegment after Uri parsing. The related code are the following

3. Write the containment entities

There are not much difference between write a containment navigation property and non-containment navigation property.

For containment navigation property whose type is a collection, the ODataFeedWriter need to be used as following.

For containment navigation property whose type is a single-value, you can use ODataEntryWriter.

4. Read the containment entities

The following is a sample client side code to read a collection of containment navigation property.

Using Parameter Alias to simplify the OData URL

$
0
0

Are you suffering from typing complex parameter values in URL? Or even make it worse you need to type it more than once?

To solve this problem, now Microsoft.OData.Core.dll 6.0 supports parameter alias in request URL, such as: ~/People?$filter=LastName eq @name and contains(Description, @name)&@name=’Bob’

It is implemented by introducing a new class: ParameterAliasNode and its base class is SingleValueNode. The alias (@name) is parsed into a ParameterAliasNode.

Note: This blog assumes that you have writing ODL server side code experience.

But how to leverage this feature? Just 3 steps:

1. Create an instance of ODataUriParser as normal:

2. Update the node visitor to support ParameterAlliasNode:

3. Call the visitor to translate the node tree into expression:

Then execute the Linq Express as normal, that’s it.

Microsoft Brings Innovations From SQL Server To Hadoop With Stinger

$
0
0

Over the last two years, Microsoft has been sharing our progress in Big Data by working to bring Hadoop to the masses through the simplicity and manageability of Windows and Azure. In 2012, Microsoft communicated our expanded partnership with Hortonworks by making Hortonworks Data Platform the core of our Hadoop solution on-premise and in the cloud.

As part of this partnership, Microsoft has collaborated with Hortonworks and the open source software (OSS) community to contribute directly to Apache Hadoop. While our first wave of contributions with Hortonworks has been to port Hadoop to Windows, we’ve recently contributed to other projects like the Stinger initiative to dramatically speed up the performance of Hive and make Hadoop enterprise-ready.

About The Stinger Initiative

In collaboration with Hortonworks and others from the OSS community, Microsoft has brought some of the technological breakthroughs of SQL Server to Hadoop. In SQL Server 2012, we introduced the in-memory columnstore which included a vectorized query execution engine and a columnar format that demonstrated 10-100x performance gains on data warehouse queries. While these improvements varied by customer scenarios, some achieved upwards of 600x. With the Stinger initiative, Microsoft is collaborating with Hortonworks to bring similar query execution and columnstore technologies to Hadoop so that we can collectively improve the performance of Hive up to 100x.

The first fruits of this has already been realized with Hortonworks Data Platform 2.0 for Windows and with HDInsight previewing Hadoop 2.2 clusters. Both of these Hadoop solutions leverage phase 2 of the Stinger project which has up to 40x improvements to query response times and up to 80% in data compression.

Microsoft is pleased to be a part of the open source software (OSS) Big Data community for the past year and a half. We’ve gained a lot from the community and are delighted to continue our partnership with Hortonworks and bringing more innovations to Hadoop.

We invite you to learn more about Microsoft’s Hadoop offers below:

Pie in the Sky (March 14th, 2014)

$
0
0

The web is 25 years old this week. I remember my manager asking me (back in the early 90's when the web started going mainstream,) if I thought this World Wide Web thing would take off and whether it was worth investing time in learning about web pages or how to use a browser. :)

Cloud

Client/Mobile

JavaScript/Node.js

Ruby

Misc

  • Vagrant supports Hyper-V out of the box: Wow, it was only about a week ago that support for Hyper-V was added, and now here's Vagrant 1.5 with support for it out of the box

  • Bing Code Search: Searches code repositories like StackOverflow, and it has a plugin to let you search, copy & paste code directly in Visual Studio. Sort of like Google Code Search (which shut down around this time last year)

Enjoy!

-Larry


The New and Improved Cardinality Estimator in SQL Server 2014

$
0
0

One area where a lot of development work went into SQL Server 2014 is the area of query optimization where a crucial component got re-written in order to improve query access and predictability.

To get a better idea about the significance of the change and the way how one should deal with it, let’s first check what the component of cardinality estimation is responsible for. 

When a query is executed for the first time, it is getting ‘compiled’. In opposite to a compilation of executables where we talk about binary compilation, the compilation of a SQL Statement is limited to figuring out the most efficient access to the data requested. This means, for example, the query engine needs to decide which index to pick in order to have the most efficient access to the data rows or, in case of table joins, the order of the join and the join methods need to be decided. At the end of the compilation there is a descriptive query plan which will be cached. During the execution phase the query plan is read from cache and interpreted. Since compilation of a query plan represents significant resource consumption, one usually is keen to cache these query plans.

At a very high level, the Query Optimizer uses the following input (in order) during the compilation phase:

  • The SQL Query and/or Stored Procedure
  • If the query or stored procedure is parameterized, then the value of the parameters
  • Knowledge about the index structures of table(s) to be accessed
  • Statistics of index(es) and columns of the table(s) to be accessed
  • Hints assigned to the query
  • Global max degree of parallelism setting

There might be a few more inputs, but these are the main ones. The first step which often is decisive for the performance of the query is to estimate how many rows the query might deliver back based on the inputs above. It is not only estimated how many rows the query would deliver. In order to decide for the correct join method in e.g. a table join, one also needs to estimate how many rows would be delivered by each execution step and query branch accessing each of the tables involved in the join. This is done in the so called Cardinality Estimation (referred to as CE from now), which this article is about. The estimated number of rows will be the base for the next step of query optimization assuming that the estimated number of rows is correct. This means the choice of picking a certain index or the choice of a join order or join type is majorly impacted by the first step of estimating the cardinality of a query.

If that estimation is plain wrong, one can’t expect that the subsequent steps of query optimization will be able to derive an efficient plan, unless the query itself leans to a particular index like querying exactly one row by specifying the Primary Key or hardly any indexes are available to be used.

In order to estimate the number of rows returned, CE requires the statistics over indexes and columns of tables as a reliable source of information to figure out how often a certain value of a parameter value, submitted with the query, is present. Hence if there are no statistics or plain wrong or stale statistics, the best CE logic will hardly be able to predict the right values of returning rows for a query.

The role CE is playing in the process to generate a good query plan is important. One hardly can expect a great Query Execution Plan as a result when the CE would estimates would be completely off workload which required SQL Server code fixes circled around issues in Cardinality Estimation.

What changed in SQL Server 2014?

At a very high level, the CE process in SQL Server 2012 was the same through all prior releases back to SQL Server 7.0 (the release before SQL 2000). In recent years we tried to put a lot of fixes or QFEs in regards to query optimization (including cardinality estimation) under trace flag in order not to cause a general regression over all those releases due to changed behavior by the fix.

With the release of SQL Server 2014, there were several good reasons to overhaul what was basically the CE as introduced 15 years ago. Our goals in SQL Server Development certainly were to avoid issues we so far experienced over the past 15 years and which were not fixable without a major redesign. However to state it pretty clearly as well, it was NOT a goal to avoid any regressions compared to the existing CE. The new SQL Server 2014 CE is NOT integrated following the principals of QFEs. This means our expectation is that the new SQL Server 2014 CE will create better plans for many queries, especially complex queries, but will also result in worse plans for some queries than the old CE resulted in. To define what we mean by better or worse plans, better plans have lower query latency and/or less pages read and worse plans have higher query latency and/or more pages read.

Knowing that the CE has changed and purposefully was not overcautiously re-architected to avoid regressions, you need to prepare a little more carefully for a SQL Server 2014 upgrade if you will utilize the new Cardinality Estimator. Based on our experience so far, you need to know first:

  • As of January 2014, feedback about the new CE is very good, showing only a small number of cases where query performance showed regressions compared to the old CE
  • Whether the overall picture with the new CE is positive or negative is dependent on the nature of the queries the workload is generating and the distribution of the data
  • Even with ISV applications it is not always easy to tell whether certain ISV application benefit from the new CE or not. Reason is that a lot of ISV applications are highly customizable and therefore can produce very different workload when used by different customers.  

Activation/De-Activation of the new CE

As delivered, SQL Server 2014 decides if the new Cardinality Estimator will be utilized for a specific database based simply on the compatibility level of that database within the SQL Server 2014 instance. In case the compatibility level of a specific database is set to ‘SQL Server 2014 (120)’ as shown below, the new CE is going to be used.

If the Compatibility level of a specific database is set to a lower value, like ‘SQL Server 2012 (110)’, the old CE is going to be used for the specific database.

As with other releases of SQL Server, upgrading your SQL Server instances in place or attaching existing databases from earlier releases of SQL Server will not result in a change of the compatibility level of a database. Hence by purely upgrading a SQL Server instance or attaching a database from SQL Server 2012 or earlier releases will not activate the new CE. One would need to change the compatibility level of such a database manually to ‘SQL Server 2014 (120)’ to activate the new CE. However toggling between the compatibility levels of 110 and 120 in case of SQL Server 2014 does have some other impact beyond the change of CE algorithms. E.g. the parallel insert functionality of SELECT INTO would be disabled by staying on the old compatibility levels as well.

Another alternative to enable/disable the different CE algorithms is the usage of a trace flag. Even with the compatibility level of a database set to ‘SQL Server 2014 (120)’, the trace flag would enforce the usage of the old CE. In opposite to the compatibility level which is applied to a specific database, the trace flag, if used as startup trace flag, would affect the usage of the CE SQL Server instance wide. This behavior might not be desired when many different databases are run in one instance. However, for situations where one runs one user database per SQL instance or consolidates several databases of a type of application under one SQL server instance, it indeed might make sense to use the trace flag approach. In order to force the old CE algorithms, trace flag 9481 can be used as startup trace flag.

Why can query plans change with the new CE?

Calculate combined density of filters differently

In this section we refer quite a bit to index or column statistics and the specific data that is captured in those. For folks less familiar with how index or column statistics are structured and what is contained in those, it would be advisable to read the Microsoft TechNet article, “Statistics Used by the Query Optimizer in Microsoft SQL Server 2008” before continuing.

One of the most significant changes in the new CE is the way how the CE calculates the selectivity of a combination of filters submitted with a query. In the old CE, the assumption was that the selectivity of the filters was independent, assuming any combination of values stored in the different columns the filters are applied to, would have the same probability of occurring. However very often reality is that combinations of values of different columns are not as independent. We can find masses of cases where values in different column within one table are not independent of each other. Think about a car brand like Toyota, Volkswagen, Ford, Mercedes, etc being represented in one column and the type like Corolla, Jetta, Focus or E-Class being represented in another column of the table. It immediately becomes clear that certain combinations can’t happen since Volkswagen is not producing a car named ‘Corolla’.

Let’s see at what the differences really are in the calculation of selectivity.

Let’s assume the following indexes against a table like this:

index_name

index_description

index_keys

FILCA~0

clustered, unique, primary key located on PRIMARY

RCLNT, GL_SIRID

FILCA~1

nonclustered located on PRIMARY

RCOMP, RACCT, RYEAR, RVERS, RLDNR

FILCA~2

nonclustered, unique located on PRIMARY

DOCNR, DOCCT, RLDNR, RYEAR, RCOMP, ROBUKRS, DOCLN, RCLNT

FILCA~3

nonclustered located on PRIMARY

REFDOCNR, REFRYEAR, REFDOCCT, RCOMP, ROBUKRS, RLDNR, RCLNT, REFDOCLN

 

The query issued against the table looks like:

SELECT

  *

FROM

  "FILCA"

WHERE

  "RCLNT" = '001' AND "RLDNR" = '10' AND "RRCTY" = '0' AND "RVERS" = '001' AND "RCOMP" = 'ABXYZ'

  AND "RACCT" = '1212121212' AND "RMVCT" = '' AND "RYEAR" = '1997' AND "RLEVL" = '0' AND "DOCTY" = ''

  AND "POPER" BETWEEN '001' AND '006' AND "DOCCT" = 'A'

SQL Server’s query optimizer using the old CE would choose the first non-clustered index (FILCA~1) to access the table. This means there would be five filters which could be applied since all the five columns of the ~1 index are specified in the query.

In the single column statistics the densities of each single column look like:

Column Name

Density

RAACT

0.125

RVERS

0.3333

RCOMP

0.5

RLDNR

0.5

RYEAR

1

Treating the selectivity/density of each filter/column independently, the question of the overall density would be:

0.125 * 0.3333 * 0.5 * 0.5 * 1 = 0.0104

A value which is confirmed when we look at the combined densities displayed in the statistics of index FILCA~1:

All density

Average Length

Columns

0.5

11.14286

RCOMP

0.0625

31.14286

RCOMP, RACCT

0.0625

39.14286

RCOMP, RACCT, RYEAR

0.02083333

45.14286

RCOMP, RACCT, RYEAR, RVERS

0.01041667

49.14286

RCOMP, RACCT, RYEAR, RVERS, RLDNR

0.01041667

55.14286

RCOMP, RACCT, RYEAR, RVERS, RLDNR, RCLNT

3.60E-05

89.14285

RCOMP, RACCT, RYEAR, RVERS, RLDNR, RCLNT, GL_SIRID

Assuming an even distribution over the values of the columns, we can assume density values as those displayed above. Be aware that CE using parameter sniffing would use the particular density value for a submitted value if the data distribution is uneven. But for simplicity reasons we assume an even distribution.  Having a very small number of rows in our table of only 27776 rows, the estimated number of rows basically was calculated out of:

# of rows / (1/combined density) which would be:   27776/(1/0.01041667) = 289.33

This means CE would expect to get 289 rows back using this index FILCA~1

Looking at the query it becomes clear that using the non-clustered index, not all columns could be filtered that were specified in the where clause. After reading the 289 entries of the non-clustered index, the real data rows would be read and the additional column filters applied then.

Now to the change with the new CE, where we assume that the combination of the different column values are not as independent. The assumption is that certain combinations of values are not coming up at all, whereas others are showing up more frequently. Besides the example with car brands and their car types, other examples of such dependencies between columns are found in all business processes and therefore in the underlying data in database tables. For example, materials or goods that are produced or stored in certain locations and warehouses only would have such a data dependencies. Or, employees associated with a certain profit center or organization would also have such a data dependencies. This means the reality is that the values in different columns are sometimes related to each other. In order to reflect that in the calculation of the possible selectivity/density, we apply a new formula with a so called ‘exponential back-off’. The logic would work like:

  • Sort the filters according to their density where the smallest density value is first. Since the density is expressed in a range between 0.xxxx and 1 the smaller values means lower density, better selectivity or more different values.
  • We also would use the first four most selective filters only to calculate the combined density. Looking at the formula we use, it becomes clear why 5thand more density values will not make big impact anymore. In our cases the order would look like:
    • 0.125, 0.3333, 0.5, 0.5 in single column density values of the filters which can be applied to the non-clustered index FILCA~1.
  • The calculation formula now would d1 * d2^(1/2) * d3^(1/4) * d4^(1/8)

If we would calculate for 5th or even 6th filter value with d5^(1/16) or d6^(1/32), the values would be close to 1 and hence have little impact on the result anyway. Therefore the restriction to the 4 most selective values does not change the result significantly.

In our case this would be: 0.125 * 0.3333^(1/2) * 0.5^(1/4) * 0.5^(1/8) = 0.0556468

Applying the same formula of # of rows / (1/combined density), we would look at:

27776/(1/0.0556468 )  = 1539 rows estimated to get out of the non-clustered index.

Which of the estimates is more accurate certainly depends on the actual data distribution at the end.

Note: Be aware that the new calculation formula to calculate combined density of filters is not reflected in the index and column statistics. The section which shows ‘all density’ in the statistics is calculated and stored the same way it always has been. No changes there due to the new CE.

A second change with the new CE

The old CE had some problems in the case of ascending or descending key values. These are scenarios where data is getting added to the table and the keys are going into one direction, like a date or incrementing order number or something like that. As the CE, in the attempt to compile a query, is trying to estimate the number of rows to be returned, it looks into the column statistics to find indications about the value submitted for this column filter. If the histogram data does show the value directly or gives indications that the value could exist within the value range the statistics cover, we usually are fine and can come back with a reasonable estimate. The problem more or less starts where the value we are looking for is out of the upper or lower bounds of the statistics. The old CE is usually assuming then that there is no value existing. As a result we will calculate with one potential row that could return. Not a real problem if we perform a select that qualifies all the columns that define a primary key or only gets a small number rows back. This could, however, develop into a larger issue if that estimation of just one row decides that this table access is the inner table of a nested loop, but there happen to be 50K rows with that very value. The fact that a new sequential value falls outside the statistics range could happen because the statistics simply didn’t get updated recently. In order to mitigate the risk encountering such a situation, we introduced trace flag 2371 a while back that should force more frequent update of the statistics. We described this scenario in this blog.

The new CE should handle these cases differently now.

Let’s show it with an example. Let’s show it with the BUKRS column which contains the company code of a financial table of wide spread ISV application. Let’s assume we have one of those tables where we have 100 different company codes between 1000 and 1200 in 1 million rows. Now let’s go through a few different scenarios:

  • An update statistics with fullscan was performed. If you issue a statement right after the update of the statistics that specifies a BUKRS value which would be out of the range of the column statistics on BUKRS, the behavior is the same as with the old CE.  This means it would estimate 1 row would be returned. The reason is that because the statistics were based a full scan plus the row modification counter indicating zero changes affecting the column(s), we would know that there was no change to the table since updating the statistics. Therefore we know that there are no rows with a value that is out of the range of the column statistics of BUKRS. So in this scenario there is no change between the behaviors shown with the new CE and the old CE in general (which was independent of the state and creation of the statistics). Please keep in mind that the method in which the statistics was updated that holds the histogram of the column is important. E.g. if the existing statistics were updated in fullscan mode, but due to a new query a column statistics on a specific column got ‘autocreated’ (with default sampling), then the new CE will behave as described in the next point.
  • Or, assume that the update statistics was done in the default manner which uses a sample of the data only. In this case, the new CE would differ in its behavior from the old CE by returning an estimate that is >1. In our particular scenario it returned an estimate of 1000 rows.

Now let’s assume we add another company code and give it a value beyond 1200 with around 10K rows

  • In the case of having all statistics updated in a fullscan manner before the rows got added, the new CE also will now report an estimate of around 1000 rows when one specifies a value which is out of range of the current statistics. As the number of rows added increases, the estimate for the rows outside do increase, but not significantly. The # of rows estimated which are found outside the boundaries of the statistics will be majorly impacted by the # of values and their occurrence in the column according to the statistics. This means the 1000 rows in our example are just a result out of the selectivity on that column (100 different values), the # of rows in the table and an even distribution. The value of 1000 as estimated in our example should only be regarded with the background of our example and query and not as a generic kind of output for all purposes.
  • For the case of having update statistics performed in the default sampling one ends up with a similar behavior as with having the update statistics done in fullscan. The initial estimate is a touch different, but completely insignificant from a pure number difference. As one adds more and more rows, the estimate of the rows expected from outside the range does increase slightly. This increase would be the same as seen in the example above. It is also not significant enough to have major impact in the plan choice. Not, at least, in this generic example.

What if we would create a new company code within the range of an index? The question is whether the new CE would behave differently if we would create a new company code within the range of existing company codes. The behavior here is not too much different between the old and the new CE. Assuming that the statistics did catch all the company codes, and assuming that there are less than 200 company codes, the statistics would indicate that no values exist between the different values documented in the histogram in the statistics. Hence the new as well as the old CE would return an estimate of 1 row only. This same estimation would happen even if there were quite a few changes to the table after update statistics was performed. This means the specific problem class of:

  • Having a small number of different values which are all captured in the stats and are all documented in the histogram part of the statistics in their own bucket
  • Another object gets added within the range of the existing data

Is not getting solved by the new CE either, because the logic applied for out of range values is not applied for values which are in-range of the statistics.

Whereas if there are that many values within a column that the 200 buckets are not able to keep all values in the histogram, the old and the new CE estimate a number of rows for a non-existing value which is within the range of the column statistics. The value estimated will be calculated mainly based on values stored as RANGE_ROWS and DISTINCT_RANGE_ROWS in the column statistics.

The small amount of examples and scenarios described here should give an idea of the most significant changes the new CE introduces. Of course, these are hypothetical explanations and dependent on the structure and dependencies between values of different columns, it might or might not be close to the truth. But especially with accesses through non-clustered indexes it can mean that with the new CE one moves a bit earlier to use the clustered index instead of the non-clustered index. But since we today have data mostly present in memory, the difference might not play out to be too significant.

On the other side, due to the way the new CE treats ascending key scenarios, table joins might benefit with better plans. Therefore more complex table joins could perform better, or have less query latency, using the new CE. This is usually caused due to a different plan being generated that provides faster execution.

How can we detect whether a plan got generated with the new or old CE?

The best way checking is to look at the XML showplan text (not the graphic representation of it). In the top section you can see a section like this:

 

   

     

        CardinalityEstimationModelVersion="70" StatementSubTreeCost="0.854615"

Or like

 

   

     

        CardinalityEstimationModelVersion="120"

When CardinalityEstimationModelVersion is set to 70 the old CE is in use. When it is set to 120 then the new CE is in use.

Summary

  • SQL Server introduces a new CE which is active for all databases with compatibility level of SQL Server 2014
  • The new CE is calculating combined filter density/selectivity differently
  • The new CE is treating ascending/descending key scenarios differently
  • There are significant changes in how column densities of different tables in join situations are evaluated and density between those calculated. These changes were not described in details in this article.
  • The different changes in calculation can end up in different plans for a query compared with the old cardinality estimation
  • Dependent on the workload or the application used, there might be the need for more intensive testing of the new CE algorithms in order to analyze the impact on business processes
  • Microsoft will support either the new or old CE
  • Customers are highly encouraged to test the new CE thoroughly and if the testing shows beneficial or neutral impact on performance of the business processes then you should use it productively

EF6.1.0 RTM Available

$
0
0

Since the release of EF6 our team has been working on the EF6.1 release. This is our next release that includes new features.

 

What’s in EF6.1

EF6.1 adds the following new features:

  • Tooling consolidation provides a consistent way to create a new EF model. This feature extends the ADO.NET Entity Data Model wizard to support creating Code First models, including reverse engineering from an existing database. These features were previously available in Beta quality in the EF Power Tools.
  • Handling of transaction commit failures provides the CommitFailureHandler which makes use of the newly introduced ability to intercept transaction operations. The CommitFailureHandler allows automatic recovery from connection failures whilst committing a transaction.
  • IndexAttribute allows indexes to be specified by placing an [Index] attribute on a property (or properties) in your Code First model. Code First will then create a corresponding index in the database.
  • The public mapping API provides access to the information EF has on how properties and types are mapped to columns and tables in the database. In past releases this API was internal.
  • Ability to configure interceptors via the App/Web.config file allows interceptors to be added without recompiling the application.
    • System.Data.Entity.Infrastructure.Interception.DatabaseLogger is a new interceptor that makes it easy to log all database operations to a file. In combination with the previous feature, this allows you to easily switch on logging of database operations for a deployed application, without the need to recompile. Migrations model change detection has been improved so that scaffolded migrations are more accurate; performance of the change detection process has also been enhanced.
  • Performance improvements including reduced database operations during initialization, optimizations for null equality comparison in LINQ queries, faster view generation (model creation) in more scenarios, and more efficient materialization of tracked entities with multiple associations.
  • Support for .ToString, String.Concat and enum HasFlags in LINQ Queries.
  • System.Data.Entity.Infrastructure.Interception.IDbTransactionInterceptor is a new interceptor that allows components to receive notifications when Entity Framework initiates operations on a transaction.

 

Where do I get EF6.1?

The runtime is available on NuGet. If you are using Code First then there is no need to install the tooling. Follow the instructions on our Get It page for installing the latest version of Entity Framework runtime.

The tooling for Visual Studio 2012 and Visual Studio 2013 is available on the Microsoft Download Center. You only need to install the tooling if you want to use Model First or Database First.

 

What’s new since EF6.1 Beta 1?

Since the Beta 1 of EF6.1 we’ve just been fixing bugs and tidying up the new features. There are no new features in the RTM since Beta 1.

 

Thank you to our contributors

We’d like to say thank you to folks from the community who contributed features, bug fixes, and other changes to the 6.1 release - RogerAlsing, ErikEJ, and mikecole.

In particular, we’d like to call out the following contributions:

  • Support for String.Concat and .ToString in LINQ queries (RogerAlsing)
  • Support for enum HasFlags method in LINQ queries (RogerAlsing)
  • Entity SQL canonical function support for SQL Server Compact (ErikEJ)
  • Fix for a bug that was affecting EF running under Mono (ErikEJ)

Cumulative Update #9 for SQL Server 2012 SP1

$
0
0
Dear Customers, The 9 th cumulative update release for SQL Server 2012 SP1 is now available for download at the Microsoft Support site. Cumulative Update 9 contains all the SQL Server 2012 SP1 hotfixes which have been available since the initial release...(read more)

Cumulative Update #16 for SQL Server 2008 SP3

$
0
0
Dear Customers, The 16 th cumulative update release for SQL Server 2008 Service Pack 3 is now available for download at the Microsoft Support site. Cumulative Update 16 contains all the hotfixes released since the initial release of SQL Server 2008...(read more)

SQL Server chez les clients – Gestion du cycle de vie d’une base de données

$
0
0

Les projets de développement mettant en œuvre des bases de données et de la Business Intelligence nécessitent un alignement avec le code applicatif, ainsi qu’une gestion du cycle de vie, et des versions comme une application classique.

Un ensemble d’outils Microsoft permet de mettre en place cette gestion efficace des développements de type SQL/BI et mettre en œuvre aussi une démarche d’ « Application Life Management » (ALM).

...(read more)

Use Enumeration types in OData

$
0
0

Enumeration is a very common data type in various data models. Enumeration types are useful means to represent data that needs to support certain programmatic options (such as the ability to detect one or more options are selected). OData v4 now supports Enumeration types, along with Primitive types, Complex types, and Entity types. And we expect there will be heavy usage of them.

Let’s look at an example. Suppose we have an entity type called Product, and we want to have a property member that stores the color of the product. We could use a string type property, but an enumeration type property would look better. We can define an enumeration type Color, and it contains members like Red, Green, Blue, etc. In Product definition we can define a member property called PackageColor, whose type is Color. Now we can assign color values to PackageColor. To get all the products that are in Green color, we can send a request like “/svc/Products?$filter=PackageColor eq Namespace.Color’Green’”.

An enumeration type can be defined with “IsFlags=true”, meaning that the members can be combined using bitwise; otherwise the members are intended to be exclusive. The Color type in the example above is defined with “IsFlags=false”, as we don’t want its members to be combined (we don’t allow a color to be both green and blue). Here another example is an enumeration type called AccessLevel, which indicates the rights on some resource. AccessLevel has some members Like: None, Read, Write, Execute. In Product type we have another property named UserAccess which is of AccessLevel type. We can assign combined values to UserAccess, such as “Read, Write”. And if we want to get all the products that we have Read and Write access to, we can send a query like “/svc/Products?$filter=UserAccess has Namespace.Color’Read, Write’”. The “has” operator is evaluated in a bitwise way. That means UserAccess may contain more bits than the value on the right of the operator.

How to use Enumeration types in your OData service? The following part will give the elaborated steps, as well as some code snippets to help you understand. You can get the complete sample code here.

On server side, you need to first build an EDM model that supports your scenario with enumeration types. With the model ready, your service needs to respond to client requests by writing the correct payload which may have enumeration type values in it. The second step is called serialization. Let’s look at the details.

1. Build the EdmModel.

To use Enum type, you need to first define your Enum types in EdmModel. The enum type can be defined with "IsFlags=true" or “IsFlags=false”. The absence of the IsFlags attribute implies the default behavior: the same as "IsFlags=false".

The following code defines two Enum types: Color and AccessLevel, and an entity type Product which has two properties UserAccess and PackageColor, of type AccessLevel and Color respectively.

In $metadata, they look like:

2. Write payload

The next step to do is enable the service to write payloads that have enumeration type values. To write an entity that has enumeration type properties is not very different from the other entities. The only difference is at the entry creation stage. An Enum property has to be assigned an ODataEnumValue object which represents the Enum value. The code snippet below shows how to do it.

One thing to notice is, the ODataEnumValue is initialized with a string (“Green”,“Write”). The string can also be the underlying value like “1”, “4”.

Once our service is ready to support enumeration type values, we can build a client to consume the data. On client side, we need to read the payload and construct the property values. This process is called deserialization. Let’s look at how the client does it.

3. Read payload

The following code shows a client that reads an entry with Enum properties. At the end, packageColor.Value and userAccess.Value have the string values of the Enum properties. In our sample, packageColor.Value would be “Green”, and userAccess.Value would be “Write”.

SQL Server 2014 releases April 1

$
0
0

SQL Server 2014, Microsoft’s cloud first data platform, released to manufacturing today

Today, Quentin Clark, corporate vice president of the Data Platform Group, announced that SQL Server 2014, the foundation of Microsoft’s cloud-first data platform, is released to manufacturing and will be generally available on April 1. Quentin’s blog discussed the tremendous momentum with the SQL Server business as well as SQL Server 2014’s new in-memory OLTP technology and hybrid cloud capabilities. Here we provide additional details on those capabilities and highlight ways customers are getting tremendous value with SQL Server 2014.


Delivering breakthrough performance with In-Memory OLTP

The worldwide data explosion is creating new challenge and new opportunities for enterprises seeking to transform their business to better respond to ever changing market conditions. Businesses need better performance from their transactional applications to respond to customer demand in real time, not at the end of the day when the batch process completes.  We have several great examples of customers using the new, in-memory OLTP engine in SQL Server 2014 to deliver breakthrough performance to their mission critical applications, including:

  • Bwin, is the world’s largest regulated online gaming company. SQL Server 2014 lets bwin scale its applications to 250K requests a second, a 16x increase from before, and provide an overall faster and smoother customer playing experience.
  • Ferranti, which provides solutions for the energy market worldwide, is collecting large amounts of data using smart metering. They use In-Memory OLTP to help utilities be more efficient by allowing them to switch from the traditional meter that is measured once a month to smart meters that provide usage measurements every 15 minutes. By taking more measurements, they can better match supply to demand. With the new system supported by SQL Server 2014, they increased from 5 million transactions a month to 500 million a day.
  • TPP, a clinical software provider, is managing more than 30 million patient records. With In-Memory OLTP, they were able to get their new solution up and running in half a day and their application is now seven times faster than before, peaking at about 34,700 transactions per second.
  • SBI Liquidity Market, an online services provider for foreign currency exchange (FX) trading, wanted to increase the capacity of its trading platform to support its growing business and expansion worldwide.  SBI Liquidity Market is now achieving better scalability and easier management with In-Memory OLTP and expects to strengthen its competitive advantage with a trading platform that is ready to take on the global marketplace.
  • Edgenetprovides optimized product data for suppliers, retailers, and search engines including Bing and Google. They implemented SQL Server 2014 to deliver accurate inventory data to customers and are now experiencing seven time faster throughput and their users are creating reports with Microsoft’s self-service BI tools to modeling huge data volumes in seconds. 


Enabling greater availability and data protection with new hybrid scenarios

The hybrid capabilities in SQL Server 2014 make it easier for customers to increase the resiliency of their applications, without adding significant additional cost.  We are bringing new value to customers, including the ability to add SQL Server 2014 in a Window Azure Virtual Machine as an additional secondary to your AlwaysOn solution. Lufthansa Systems is a full-spectrum IT consulting and services organization that serves airlines, financial services firms, healthcare systems, and many more businesses. To better anticipate customer needs for high-availability and disaster-recovery solutions, Lufthansa Systems piloted a solution on SQL Server 2014 and Windows Azure that led to faster and more robust data recovery, reduced costs, and the potential for a vastly increased focus on customer service. 

Backup to Windows Azure enables customers to configure backups, which are compressed and encrypted, directly to Azure for greater data protection, taking advantage of Microsoft’s global datacenter. Customers gain the benefits of Windows Azure storage, which offers geographic redundancy at a lower cost than maintaining additional hardware. Beyond SQL Server 2014, this technology will also be generally available as a standalone tool, SQL Server Backup to Windows Azure Tool, on April 1 supporting prior versions of SQL Server. In addition to greater data protection, SQL Server 2014, combined with the standalone tool, can provide significant savings for CAPEX and OPEX as well as deliver on customer needs for greater consistency in backup strategies across their SQL Server environments. 


More customer value for mission critical applications and hybrid cloud

Beyond in-memory OLTP and these hybrid capabilities, there is a lot more to be excited about in SQL Server 2014:  

  • Enhanced In-Memory ColumnStore for data warehousing – First delivered with SQL Server 2012, the In-Memory Columnstore is now updatable with as much as 5x data compression for more real-time analytics support.
  • New buffer pool extension support to non-volatile memory such as solid state drives (SSDs) – Increases performance and extends your in-memory buffer pools to SSDs for faster paging.
  • Greater performance with new query optimization capabilities
  • Enhanced AlwaysOn – Built upon SQL Server 2012, AlwaysOn delivers mission critical availability with up to eight readable secondaries and enhanced online operations.  As I mentioned earlier, you can enroll a Window Azure Virtual Machine as part of your high availability topology.
  • Enhanced security – Achieve greater compliance with new capabilities for separation of duties.  For example, a database administrator can manage data without seeing the sensitive data itself.
  • Greater scalability of compute, networking and storage with Windows Server 2012 and Windows Server 2012 R2.
    • Greater compute scale – Scale for up to 640 logical processors and 4TB of memory in a physical environment and up to 64 virtual processors and 1TB of memory per VM.  
    • Network virtualization and NIC teaming to increase network throughput and more easily move SQL Server between datacenters.
    • Storage virtualization with storage spaces to better optimize storage costs as well as improve resilience, performance and predictability.
    • Greater ease of use for migrating existing apps to a Windows Azure Virtual Machine 

SQL Server 2014 continues to deliver a complete business intelligence (BI) platform for both corporate and self-service BI to help you gain faster insights on any data. Our business intelligence capabilities has received strong recognition across our data platform from analysts becoming stronger with the recent release of Power BI for Office 365. Power BI is our cloud-based business intelligence service that gives people a powerful new way to work with data in the tools they use every day, Excel and Office 365.

 

Sign-up for the SQL Server 2014 release available April 1

Thank you to our amazing SQL Server and data platform community who have given us significant contributions to this release. For Community Technology Preview 2, there were nearly 200K evaluations of SQL Server 2014 including 20K evaluations with SQL Server 2014 running in a Windows Azure Virtual Machine.  We appreciate your active engagement to evaluate the new release and we are excited to share the final production release with you on April 1. 

For those who want access to the release as soon as possible, please sign-up to be notified once the release is available.  Also, please join us on April 15 for the Accelerate Your Insights event to learn about our data platform strategy and how more customers are gaining significant value with SQL Server 2014.  There also will be additional launch events worldwide so check with your local Microsoft representatives or your local PASS chapter for more information on SQL Server readiness opportunities. 

Thanks.

 

Eron Kelly

General Manager

Data Platform Group


“Modernizing” Your Data Warehouse with Microsoft

$
0
0

Data warehousing technology began as a framework to better manage, understand, and capitalize on data generated by the business, and it worked extremely well for many years. This is a space that Microsoft knows well with warehousing capabilities since 1995 with SQL Server. 

However, there are several forces working to stretch the traditional data warehouse. Data volume is expanding tenfold every five years. Even the most robust SMP warehouse will require costly forklift operations to a larger and more expensive hardware footprint to keep up with the growth. Companies are using real-time data to optimize their businesses as well as to engage in dynamic, event-driven processes. The variety of new data types is proliferating with over 85 percent of new data coming from non-relational data such as logs, mobile, social, RFID, and devices.

Modernizing your data warehouse with new technologies can help you meet the needs of today’s enterprise to connect all volumes of any data with agile and familiar BI to business decision makers. This was validated by The Data Warehousing Institute (TDWI) who published a checklist to enable the modern data warehouse.  At a high level, your new warehouse must be able to handle:

  • Data of All Volumes: The modern data warehouse can scale up to any volume of data starting from terabytes up to multi-petabyte scale on all data types – relational and non-relational. As an example, Virginia Tech was able to crunch data from DNA sequencers (growing over 15 petabytes a year) to do cancer research.
  • Real-Time Performance: The ability to work with data in real time to keep up the pace of increasing demands without losing performance. As an example, MEC was able to bring queries of customer online visitation metrics from four hours down to minutes.
  • Any Data Types: The ability to seamlessly integrate over any data types from traditional relational sources to new non-relational sources. As an example, Direct Edge was able to join non-relational stock exchange messaging with their relational stock ticker data.

Microsoft has a comprehensive solution to modernize your data warehouse across software, appliance, and cloud for this new world of data. We invite you to learn more of our offerings:

 

Software

Appliance

Cloud

Relational Data

SQL Server 2014

SQL Server Fast Track

Parallel Data Warehouse with PolyBase

SQL Server for DW in WA Virtual Machine

Non-relational Data (Hadoop)

Hadoop on Windows

Parallel Data Warehouse with PolyBase

Windows Azure HDInsight

 

Product Offerings:

  • SQL Server 2014: Microsoft SQL Server 2014 will be generally available on April 1 and includes technologies like an updateable in-memory columnstore that can increase data warehouse queries 10-100x.
  • SQL Server Parallel Data Warehouse: SQL Server Parallel Data Warehouse (PDW) is a Massively Parallel Processing data warehouse appliance that can scale out queries to Petabytes of data. It includes PolyBase, a feature that can allow you to seamlessly query & integrate both relational data and Hadoop data with the same T-SQL query. 
  • SQL Server hosted in a Windows Azure Virtual Machine: SQL Server Enterprise for data warehousing can be installed and hosted in the cloud on Windows Azure Virtual Machines. This image takes advantage of best practices from the Fast Track reference architecture to tune SQL Server for data warehousing in Windows Azure.
  • Windows Azure HDInsight: Windows Azure HDInsight is a Hadoop-based service from Microsoft that brings a 100 percent Apache Hadoop solution to the cloud. You can seamlessly process data of all types with simplicity, ease of management, and an open Enterprise-ready Hadoop service all running in the cloud. We recently announced the general availability of HDInsight running Hadoop 2.2 clusters.
  • Hortonworks Data Platform For Windows: Through a strategic partnership with Hortonworks, Microsoft co-developed the Hortonworks Data Platform for Windows to customers who want to deploy Hadoop on their own servers. Learn more about Microsoft’s contributions to Hadoop.

You can learn more about Microsoft solutions for the modern data warehouse by tuning in for a live stream of our April 15th Accelerate your insights event and by visiting these resources:

The ASP.NET Web API 2.2 for OData release and the OData Client Code Generator release

$
0
0

We are very excited to announce the availability of nightly builds on MyGet for ASP.NET Web API 2.2 for OData v4.0 along with OData Client Code Generator for writing OData v4 clients. The Web API 2.2 release is particularly noteworthy as it is the first Web API release with support for OData v4 with selected highly demanded features. For a summary of what’s new in OData v4.0, please visit What’s New in OData v4.

What’s New in ASP.NET Web API 2.2 for OData v4.0

New features

  1. Protocol and format changes from V3 to V4.
  2. OData attribute routing: This release allows you to define the routes in your controllers and actions using attributes.
  3. Support for functions: This release allows you to define functions in your OData model and bind them to actions in your controller that implement them.
  4. Model aliasing: This release allows to change the names of the types and properties in your OData model to be different than the ones in your CLR types.
  5. Support for limiting allowed queries: This feature release the service to define limitations on the properties of the model that can be filtered, sorted, expanded or navigated across.
  6. Support for ETags: This release allows to generate an @odata.etag annotation based on some properties of the entity that can be used in IfMatch and IfNoneMatch headers in following requests.
  7. Support for Enums: Improved our support for Enums and now we support them as OData enumerations.
  8. Support for $format: Added support for $format, so clients are able to specify the desired format of the response in the URL.

Where to get it

It is now available at ASP.NET Web API 2.2 for OData v4.0 on MyGet. We recommend version v5.2.0-alpha1-140307 as a good place to start kicking the tires or go ahead and try the latest greatest. Please refer to the instructions on how to use nightly builds on our CodePlex site.

Sample codes

We have provided a comprehensive list of OData v4.0 service samples to accelerate your own v4.0 service implementation. They are available at https://aspnet.codeplex.com/SourceControl/latest#Samples/WebApi/OData/v4/

Tutorial Blog

http://blogs.msdn.com/b/webdev/archive/2014/03/13/getting-started-with-asp-net-web-api-2-2-for-odata-v4-0.aspx

OData Client Code Generator to Support OData v4.0 Spec

New Features

  1. OData v4.0 Spec:  This release is able to generate C# and VB .NET client-side proxy classes to consume an OData V4 service.
  2. Enum Support:  This release supports generating the codes of enum values and properties for OData Client library.
  3. Singleton Support: This release supports generating the codes of singleton values. You could now compose a LINQ query using singletons values.

Platform

We support the Professional, Premium, and Ultimate versions of Visual Studio 2010, 2012, 2013.

Where to get it

http://visualstudiogallery.msdn.microsoft.com/9b786c0e-79d1-4a50-89a5-125e57475937

Blog and Tutorial

http://blogs.msdn.com/b/odatateam/archive/2014/03/11/how-to-use-odata-client-code-generator-to-generate-client-side-proxy-class.aspx

Call to Action

If your team has an existing OData service or is considering adding an OData service or migrating your V3 service to V4, now is an excellent time to engage with us. For any feature request, issues or ideas please feel free to reach out to us.

 

Thanks,

The OData Team

OData core libraries now support OData v4

$
0
0

Hi all,

We are tremendously excited to announce that we have released version 6.0.0 of the OData core libraries to NuGet on Jan 27th. This release is particularly noteworthy as it is the first production-ready release
with support for OData v4, the newest version of the OData protocol. We had two primary goals for this release:

  1. Feature parity. The new stack supports everything that was supported before, but all payloads and other functionality are now compliant with the v4 protocol.
  2. Selected new features. This release adds support for enum types, singletons and containment.

We have achieved both of these goals and are now well positioned to continue adding support for v4 features. 

Stack Prioritization & Strategy Adjustment

As you are probably aware, our team aligns with many teams inside and outside of Microsoft who are building OData services. This includes teams across every major division at Microsoft. Based on these engagements, we believe that Web API provides the right platform for OData services going forward and as such will be investing primarily in that platform for OData server stacks. We will of course continue to put significant resources into the OData core libraries and client, but we do plan to reduce investment in WCF Data Services as a stack for creating OData services. To mitigate the inconvenience this may cause, we are working on cleaning up the code and making it compatible with OData v4, and will then release that stack as open source. We do not plan to put any significant investment into adding v4-specific features to the WCF DS stack. Web API is actively working on adding support for OData v4 and nightly builds with OData v4 support will be available in February, with a public CTP to follow in March.

You might notice we have also made some branding changes to our client to align it more with the protocol. The client is and always has been an OData client (and not a WCF Data Services client), but now the name reflects the code more accurately. Also, we have adjusted our code generation strategy to use T4 rather than CodeDom. This means that if you want to have a customized code gen experience for your OData service, you can start from our T4 template and make whatever tweaks you wish.

Call to Action

If your team has an existing OData service or is considering adding an OData service, now is an excellent time to engage with us. In addition to building OData stacks, part of our charter is to help Microsoft align behind the OData protocol. If you need input on whether OData can do what your service needs to do, or input on what version of OData you should implement, please feel free to reach out to us.

Release Notes

The full release notes are as follows:

This release includes the core .NET libraries described below for implementing OData clients and services that comply with the Committee Specification 02 of the OASIS OData V4 Specification (http://docs.oasis-open.org/odata/odata/v4.0/cs02/part1-protocol/odata-v4.0-cs02-part1-protocol.doc). The libraries support the OData V4 JSON (http://docs.oasis-open.org/odata/odata-json-format/v4.0/cs02/odata-json-format-v4.0-cs02.doc) format; the Atom format is not supported in this release.

What is in the release?

  • Feature parity: This release supports the same set of features that were supported in version 5.6.0 of the OData core libraries. This release supports OData v4 only and is not backwards compatible with OData versions 1-3.
  • Enum support: The core libraries now have support for serializing and deserializing enum values in JSON payloads. The URI parser is able to parse enum values and operations including ‘has’.
  • Singleton support: The core libraries now have support for serializing and deserializing singleton values in JSON payloads. The URI parser is now able to parse singletons in paths.
  • Containment support : The core libraries now have support for serializing and deserializing contained values in JSON payloads. The URI parser is now able to parse contained paths.
  • Function support: The URI parser is now able to parse functions and function parameters in URLs.
  • OData v4 compatibility: The JSON format, $metadata format and URI parser have all been updated to support OData v4.

Known Limitations

  • This release of the OData core libraries targets functional equivalence with the 5.6.0 release as well as support for a few new features. This means that there are many new OData v4 features that are not supported yet in the core libraries.
  • Although the OData core libraries are capable of serializing the OData v4 Atom format, this functionality is not officially supported since the Atom specification has not yet made it to the CS02 stage.

Again, please feel free to reach out if you have any questions.

Thanks,

The OData Team

OData 6.1 and OData Client 6.1 are now shipped

$
0
0

Hi all,

As our commitment in rolling out new functionality in the OData V4 protocol continuously, we are excited to announce that OData Core Libraries 6.1 has been released to NuGet.  This release contains four packages, Core, EDM, Spatial and the .Net client.

Call to Action

If your team has an existing OData service or is considering adding an OData service, now is an excellent time to engage with us. In addition to building OData stacks, part of our charter is to help Microsoft align behind the OData protocol. If you need input on whether OData can do what your service needs to do, or input on what version of OData you should implement, please feel free to reach out to us. 

Release Notes

Bug fixes:

  • Fixed a bug for not adding OData-Version header automatically for IODataResponseMessage.
  • Fixed a bug for duplicate function-import element in ServiceDocument if a function is overloaded.
  • Improved the JSON serialization performance for Edm.Binary type.

New Features:

  • ODataUriParser supports new query option $search.
  • ODataUriParser supports $count, $filter, $top, $levels, $orderby, $search and $skip in expand.
  • EdmLib supports adding Core.Description and Core.OptimisticConcurrencyControl annotation through API directly. 

Thanks,

The OData Team

Pie in the Sky (March 21st, 2014)

$
0
0

It has been a busy, long week, but I have managed to find time to read a few links. Here are the interesting ones.

Cloud

Client/Mobile

.NET

JavaScript/Node.js

Ruby

Misc.

Enjoy!

-Larry

Viewing all 808 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>