In-Depth
Data Dilemma
Mapping a strategy for Microsoft's new data-programming models.
After trumpeting the arrival of Language Integrated Query (LINQ) and LINQ to SQL in .NET 3.5, the rest of Microsoft's new data-related technologies were still works in progress when the company released to manufacturing Visual Studio (VS) 2008 last November.
By the middle of May, ADO.NET Entity Framework (EF), ADO.NET Data Services (code-named "Astoria") and ASP.NET Dynamic Data had passed Microsoft's sniff test and were marshaled into the .NET 3.5 and VS 2008 Service Pack (SP) 1 betas. Microsoft plans to ship the final releases of both Service Packs this summer.
"We expect when you're using the Entity Framework, and writing applications against the Entity Framework, LINQ will be the primary way that you do that," says Michael Pizzo, Microsoft's principal architect in the Data Programmability Group. "But LINQ itself is a technology that allows you to write strongly typed queries over any LINQ provider, of which we have four shipping as of Visual Studio 2008 SP1."
Indeed, VS 2008 SP1 contains the initial implementation of the Data Programmability Group's strategy of combining LINQ and the Entity Data Model (EDM) into the standard data platform for all new data-related features of .NET and Visual Studio, such as ADO.NET Data Services, the EntityDataSource and ASP.NET Dynamic Data. EF version 1 is only the first step in a multiyear, multirelease program of enhancing the EDM and extending it to other apps, such as reporting services, data synchronization and extract, transform, load (ETL) operations.
"There are so many streams of activity on the data front at Microsoft that it's a bit dizzying," says Andrew Brust, chief of new technology at twentysix New York, a consulting firm and software integrator. "This is a good thing if you enjoy the amount of sheer creativity in the works now, but it's a bad thing if you like clarity and wish to avoid complex choices."
 |
"The fact that [EF] offers the sub-choices of LINQ to Entities, Object Services and the Entity Client means that there's a lot of fragmentation out there, and a shakeout seems inevitable." |
| Andrew Brust, Chief of New Technology, twentysix New York |
With .NET 3.5 ushering in LINQ and LINQ to SQL, and VS 2008 SP1 bringing EF and other data-related technologies to the .NET developer's toolkit, what strategies are developers adopting for building enterprise .NET apps with Microsoft's emerging data-access technologies?
"Let alone the choice between LINQ to SQL and the Entity Framework," says Brust, "the fact that [EF] offers the sub-choices of LINQ to Entities, Object Services and the Entity Client means that there's a lot of fragmentation out there, and a shakeout seems inevitable.
"Now layer on the fact that Windows Presentation Foundation and Silverlight have their own new data-binding models, which are at once rich yet lacking good tooling support in Visual Studio," Brust continues, "and things can be very frustrating."
Mapping Mayhem
When VS 2008 SP1 is released this fall, many developers will get a firsthand look at EF version 1, which consists of the EDM, a graphical EDM Designer and an EntityDataSource component for ASP.NET-bound controls. EF is the foundation for several technologies -- including ADO.NET Data Services and ASP.NET Dynamic Data -- that rely on relational databases as an object-persistence store and need an object/relational mapping (O/RM) layer to make the transition from on-disk tables to in-memory object graphs.
What distinguishes EF from the 40 or so other O/RM tools for .NET, like NHibernate, is the EDM, which is based on Dr. Peter Chen's famous Entity-Relational (E-R) data model of 1976. The original EDM relied on a conceptual schema generated by an XML schema file to define a set of entities and associations, a storage schema that's an XML representation of the database tables and their relations, and a third XML file that maps the conceptual to the storage layer.
Both EF and LINQ to SQL's O/RM tools use a wizard to let developers select an existing persistence database and store its connection string in App.config or Web.config.
With EF, a wizard with dialogs lets developers select the tables, views, stored procedures and table-valued functions (TVFs) to include in their models. EF can connect to any RDBMS for which an EF-enabled managed-data provider is available. Providers for IBM DB2, IBM U2, Informix Dynamic Server, MySQL, Oracle, PostgreSQL and SQLite will be available when VS 2008 SP1 releases, according to a blog posting by David Sceppa, Microsoft's ADO.NET program manager.
EF supports two new query languages: Entity SQL and LINQ to Entities. Entity SQL is an ANSI SQL derivative that's common to all EF-enabled data providers, which are responsible for translating their SQL dialect to Entity SQL. Entity SQL includes new reserved words to support added concepts like entity associations and nested DbDataRecords, but lacks data-update keywords. LINQ to Entities is a LINQ implementation with a command tree for translating LINQ queries to provider-compatible SQL queries.
LINQ to SQL, which is tethered to SQL Server 2000 and subsequent versions and Transact-SQL, also lets users drag tables and views from Server Explorer to the O/R Designer surface and stored procedures or TVFs to the designer's Methods Pane. So when should a developer use LINQ to SQL versus EF?
Microsoft's Pizzo offers this recommendation in an April 2007 blog post:
"LINQ to SQL supports rapid development of applications that query Microsoft SQL Server databases using objects that map directly to SQL Server schemas. LINQ to Entities supports more flexible mapping of objects to Microsoft SQL Server and other relational databases through extended ADO.NET Data Providers ... If you don't require any of these features, LINQ to SQL may provide a simpler solution for rapid development."
LINQ to SQL, which was written by Microsoft's Matt Warren and the C# team, is facing less adversity than EF version 1 (see "Entity Framework's Rocky Road to RTM"). The ADO.NET team, lead by Pizzo, originally planned to release EF as a VS 2008 component; it was dropped from the final product just days after the beta 1 release in April 2007 when Microsoft decided that the technology wouldn't make the release to manufacturing (RTM) cut-off.
This gave the EF team breathing room to improve the O/RM process by merging the three XML schema files into one EDMX file and refining the EDM Designer, adding support for complex types -- also called value types -- and an Update From Database feature that conforms the EDM to altered storage schemas. The extra development time also let the EF team take advantage of a new Windows Communication Foundation (WCF) feature in SP1 that enables the DataContractSerializer to serialize and deserialize complete object graphs, including EntityReference and EntitySet associations, which LINQ to SQL can't handle.
However, because their respective ObjectContext and DataContext objects aren't serializable, neither EF nor LINQ to SQL version 1 supports disconnected operations or n-tier service-oriented architecture directly. EF has no built-in security features, so authentication and authorization are based on Windows security or UserName/Password combinations specified in the database connection string.
Data-Driven Framework
One of the most controversial issues facing adoption of EF version 1 in enterprise-level .NET applications is that it's data-driven, not domain-driven. Advocates of domain-driven design believe that modeling apps on relational database schemas leads to deficient architecture.
The data-centric design and missing features haven't stopped Emeryville, Ca.-based IdeaBlade Corp. from integrating EF with its DevForce framework for programming n-tier, smart-client .NET apps. Microsoft's announcement of LINQ at PDC 2005 "led us to re-design DevForce to use LINQ and to ride on top of EF," recalls Ward Bell, VP of product management at IdeaBlade.
"Domain-driven design is ideal for greenfield projects, but most of us are dealing with big legacy databases," he says. "We're kind of handcuffed and a data-driven design is usually the fastest route to where we'll end up anyway."
Bell credits EF with "setting the standard for object/relational mapping with EDMX files so everyone can do their own add-on. I believe Entity Framework is real and Microsoft's commitment to it is total," he says. "Microsoft will create a huge ecosystem around the Entity Framework with related technologies, books, training and promotion. It'll take object/relational mapping out of the dark corner of niche products and into the enterprise mainstream."
Brust agrees with Bell's view of EF's status but warns: "Developers and, even more, developer managers need to be strategic right now, or they risk a large investment in technologies that currently offer less productivity, and which could go away."
It's clear, says Brust, that EF and EDM are strategic for Microsoft -- and the same holds true for Silverlight and Windows Presentation Foundation (WPF) in the presentation layer. "It's also clear," he says, "that the early versions of these technologies lack feature completeness and the tooling support necessary to make them economically sensible for mainstream dev teams. I'd advocate study, pilot projects and patience with this new generation of data-access technologies. A prohibition on their use is a bad idea, but recognize them for what they are at present: fodder for early adopters who are prepared to invest, learn and be tolerant of the bumps along the way."
EF's lack of an out-of-the-box n-tier solution created IdeaBlade's market opportunity: adding the enterprise features that are missing in EF version 1. DevForce EF, which is at the release candidate stage, includes a middle-tier Business Objects Server (BOS) for moving domain objects over the network. Remote clients invoke Entity SQL and LINQ to Entities queries, which DevForce sends to the BOS for execution by EF on the host side of the cloud.
In EF version 1, performance is a serious problem for distributed clients, according to Bell. DevForce substitutes its client cache for EF's client-side ObjectContext. "With our queryable object cache you can avoid redundant trips to the server and you can support offline scenarios," he says.
EF's visual EDM Designer can handle small models mapped to tens of tables, but it's unwieldy for models built from the hundreds or thousands of tables that are typical of large apps. "The EDM diagrams look like spaghetti," Bell says. IdeaBlade offers an alternative designer that enables viewing and editing of EDMX files within a grid. The company expects to release a beta version of a Silverlight 2 client for DevForce EF next month.
Some frameworks use EF only as a data layer. Rockford Lhotka, author of the Component-based Scalable Logical Architecture business-object framework for .NET (CSLA.NET) and principal technology evangelist for Magenic Technologies Inc., is taking this approach. "While I think that the Entity Framework may become a truly powerful technology in later versions," he says, "in the short term I intend to use it as a data access layer.
"The primary purpose of CSLA .NET is to enable the creation of a powerful object-oriented business layer. Technologies like EF simplify the process of persisting CSLA business objects by raising the level of abstraction when interacting with the database. I expect to use EF behind CSLA, just like I use ADO.NET or LINQ to SQL today," explains Lhotka.
Astoria's Data Layer
EF is the preferred data layer for ADO.NET Data Services and the only data source that currently supports data updates. ADO.NET Data Services is a RESTful front-end for EF or LINQ to SQL that takes advantage of WCF in .NET 3.5's new Atom Publication (AtomPub) protocol or JavaScript Object Notation (JSON) serialization options. It exchanges HTTP POST, GET, PUT and DELETE methods for conventional create, retrieve, update and delete (CRUD) Web service methods that emit dynamic SQL or execute stored procedures. All data-related operations depend on an addressing scheme with URIs that specify a particular entity or entity set.
Seattle-based software development firm Vertafore Inc. is building an integration interface for a large, Software as a Service-based app. "We're looking very closely at EF/Astoria as a way to quickly build out the breadth of the interface compared to doing it by hand," says Chris Kinsman, Vertafore's VP of development. "We've tried using Microsoft's Web Services Software Factory 1.0 and found that while it ramped us up quickly, it also was a nightmare to extend and maintain. We then started doing code generation but it just didn't feel right.
"Now we're investigating Astoria," he says, "because we like the idea of providing a RESTful data layer. However, we're concerned about the lack of guidance for authentication/authorization and it doesn't appear trivial to add. We get the feeling that it's potentially a good starting point but might take a bit of time to mature."
 |
We like the idea of providing a RESTful data layer. However, we're concerned about the lack of guidance for authentication/authorization and it doesn't appear trivial to add." |
| Chris Kinsman, VP of Development, Vertafore Inc. |
Elisa Flasko, community program manager for Microsoft's Data Programmability Group, responds to Kinsman's concern in an e-mail: "Astoria doesn't directly implement authentication mechanisms or attempt to impose a particular access-control policy (it's unlikely that a single such policy would satisfy all use cases); instead, it provides the appropriate hooks and allows for the application of specific patterns for authentication and access controls so that developers can re-use existing packages-for example, WCF built-in authentication schemes, ASP.NET role provider; or roll their own -- for example, row-level security using interceptors."
Integration with ASP.NET security enables authentication -- over-HTTP, but authorization will require additional customization. Data-enabled Silverlight 2 projects probably will be the primary host for ADO.NET Data Services clients.
The client library (System.WebClient.dll) provides DataContext and DataQuery objects that mimic EF's ObjectContext and ObjectQuery types. A command-line tool, DataSvcUtil.exe, generates the class file for an ADO.NET Data Services client. ADO.NET Data Services clients support a LINQ subset unofficially called LINQ to REST; data sources must support IQueryable<T>, which includes EF and LINQ to SQL. Persisting entity changes requires the client to support IUpdatable<T>, which only EF implements today. Concurrency conflict management uses an ETag with a response header that contains a timestamp.
Troy Magennis, enterprise software architect for Bill Gates' Corbis Corp. and curator of the Hooked on LINQ wiki, is test-driving LINQ to SQL/ADO.NET Data Services. "I've been building prototypes using ADO.NET Data Services for about the last month," he says. "I'm incredibly impressed with its cleanliness. I think it's a lighter-weight technology for providing access to simple domain data. EF really fried my brain having to deal with three models in one XML file. I found the complexity of relationships, even simple one-to-one and one-to-many, difficult to keep working."
Magennis says Corbis will still use SOAP services for software integration, both for its own internal development and for engagement with third parties -- but ADO.NET Data Services will do everything else. "Security and Denial of Service issues are my main areas of concern, but I think the positives outweigh the negatives of this technology," he says. "We're backing Astoria with LINQ to SQL and staying read-only until we nail the security issues."
Dynamic Data as an O/RM Front-End
Another brand-new technology, ASP.NET Dynamic Data (DD), is an SP1 candidate for using EF's EntityDataSource or LINQ to SQL's LinqDataSource controls to fully scaffold data-entry Web pages. The pages are similar to those generated by Ruby on Rails but much more easily customized and extended. DD is more egalitarian than ADO.NET Data Services in its support for diverse data sources; all that's required to hook up to DD is a class with a property that implements LINQ's IQueryable<T> interface for each entity set.
Solutions Design BV is enabling its LLBLGen Pro O/RM and code-generating tool as a DD data source. LLBLGen Pro lead developer Frans Bouma recounts his introduction to DD: "At first I wasn't very interested in Dynamic Data because it looked like demoware," he says. "Ironically, after Laminar Consulting Services CTO Bryan Reynolds gave me a demo and showed me how easy it was to use and extend, I was sold and decided to build support for it into LLBLGen Pro. It's great that the Dynamic Data folks at Microsoft opened up their API so we were able to support DD in full."
But Bouma cautions: "An application that can create a simple Web data-entry application quickly is a valuable asset in every developer's toolbox. But there's a catch: Developers have to remember that the UI isn't the application, and that the rules and patterns for well-engineered software apply equally to code generated by RAD tools."
Bouma also says that new code to make LLBLGen Pro a data source for ADO.NET Data Services will be added before SP1 ships.
DD lacks authentication and authorization features, so Steve Naughton, a developer of Web applications for the construction industry in the United Kingdom, published a tutorial supplemented by a database implementation on his C# Bits blog.
"My primary use for Dynamic Data is prototyping data-entry systems rapidly," Naughton says. "I think that the reusable FieldTemplates alone make it worth investigating by any enterprise developer who's the least bit dubious about the scaffolding process as a whole. Don't forget that you can write a traditional ASP.NET app with Dynamic Data and just take advantage of the DynamicControl and DynamicField controls to make use of the FieldTemplates. The FieldTemplates greatly reduce the time needed to develop and maintain the site by strictly enforcing the 'Once and Only Once' design pattern."
The jury's still out on the question of the practicality of EF version 1 by itself for data modeling and object persistence in large-scale projects. But that doesn't preclude its use as an O/RM tool for the small and midsize solutions that contribute the lion's share of .NET developers' business. EF version 1 will also gain market share as the data source for ADO.NET Data Services and DD, which are relatively feature-complete. The ADO.NET team's new advisory council -- which meets for the first time this month -- and its plans for more transparent design methodology for EF version 2 increase the likelihood of Microsoft getting it right the second time.