Please login or register to participate.
Wiki Page

Why cyn.in data is stored in ZODB and data migration mechanisms

.

Introduction

The ZODB is a powerful object database for Python objects. It's very mature - it's been around for more than a decade. It is transactional, has advanced features like clustering (ZEO), blob support. All Zope based application servers and thus the Content management systems like cyn.in that rely on Zope, use the ZODB as its default data storage, and it's seen a lot of battle testing.

ZODB Concepts

ZODB is to relational databases as Python is to statically-type languages such as Java. For many applications, especially applications involving complex objects, a database like the ZODB is a lot easier to deal with. This is a key reason why ZODB is the most popular back end for Zope applications. The ZODB takes a minimalist approach to the services it provides. It provides basic persistence and little else. Other services, like security and indexing, are provided at the Zope application server level.

Zope application server provides clustering by creating a client-server based cluster between "Zeo-Clients" and the "Zeo-Server". A Zeo-Server provides server access through to any number of zeo clients that in turn provide end user services like Web Services, WebDAV and FTP, and each zeo client provides their own catalog cache, among others. Load balancing is delegated to a software protocol proxy server like Apache HTTPD, Squid proxy server and in high load scenarios can be done by hardware server load balancers.

Comparison and integration with other storage technologies like RDBMS and external file storage

Transparent object storage and retrieval

An important difference between object databases and relational databases is that object databases store already assembled objects. In a relational databases, objects that can't be represented as records of basic data values must be assembled through database joins - this can be very expensive and cumbersome, both conceptually and computationally.

In ZODB, the object is directly "pickled" which is the python term for storing an object along with all persistent attributes and can reference other objects which in turn store their own attributes and so on. This provides a unique flexibility advantage to cyn.in because each object being stored is alive - any changes on it are transparently persisted and are available immediately, application server wide. Object properties are indexed and can be retrieved directly from the index at very fast access times without needing to "awaken" the full object, which is only required when property access to non-indexed data is necessary.

Thus, content types and object instances of these content types can be flexible enough to accommodate addition of new properties and changes can be done through simple migration procedures. Having this flexibility is a key requirement to an iteratively developed product like cyn.in.

As an example, consider how any content type in cyn.in has several indexed fields which are added to it at runtime and updated on activity. These include:

  • lastchangedate: This lets us know when in time, the last activity performed on an object was done.
  • lastchangeaction: This lets us know what the activity performed was - one of: created, edited, discussed, workflow status changed, and so on.
  • lastchangeperformer: This lets us know which user performed the last activity. This stores the actual username, so we reference it to the object of actual user performing the action.

These fields were added across all content types and used without modifications to actual data storage code of and around the content types, they were done completely externally. This kind of flexibility is difficult to achieve in relational database management systems because they store object data in tabular format - where a single object must be persisted across several different tables, each storing intrinsically related data in record form. Any change to the schema of the tables means checking, validating and usually rewriting all intersecting SELECT, UPDATE and DELETE statements with further, manual overheads of relational, referential structural integrity checks and so on.

Integration

There are cases however, where a relational storage strategy makes some sense. These cases usually revolve around transaction oriented activities where there may be logging-like activity with a lot of routine additions happening every second or so. In cases like these external RDBMS systems can be easily harnessed to store and manage the data. cyn.in can directly use these systems in either read-only or also with read-write modes.

A useful case where this is typically required is in cases of integration with external systems. An example for this would be an existing transaction oriented customer order booking system, where the order data is managed by an external system and cyn.in can read data from the system directly from its database tables and also provide updates when/where necessary.

Note however that in such scenarios, Cynapse recommends that interfacing of integration typically be done at the systems application level using integration API or by providing a customized integration server / software application which participates in the actual application flow. When an order transaction happens (in the above example) the integration API / server would trigger appropriate activity in cyn.in, as compared to having cyn.in monitor database tables directly.

External File Storage

cyn.in can also provide external storage of files that are managed by it, directly on disk. This strategy provides increased access speeds of binary files, especially in the case of large files like video and audio media. The cyn.in application server in this case only provides a reference to the actual file, and files are directly provided by the end-serving service. BLOBs and files are directly stored on the native operating system storage file system in this strategy and can also be structured in a way that they can be directly retrieved from the file system.

Notes:
  1. By it's very nature, files stored on OS file system cannot be directly updated and can only be directly accessed as read only - this is required to ensure that referential integrity is maintained with the cyn.in server
  2. This strategy is based on existing technology but is not immediately available, appropriate code changes in cyn.in are required to be able to store BLOBs externally like this, but it can be done in feature-expedition mode.

Data Transfer mechanisms

In cyn.in, several data transfer and access mechanisms are available. Data entry and exit points are crucial to the integration strategy of cyn.in and as a business strategy, it is required to ensure adoption without concern or fears of "data lock in".

cyn.in provides several mechanisms of direct file and data access. These are detailed below
A note on Security:
  1. cyn.in is a secure storage system. All content data stored in it cannot be revealed to outside access (by API or browser) without authentication.
  2. Even after authentication, ACL and role structuring must be honored - a person who does not have access to a resource will be denied.
  3. Thus, ALL access mechanisms require the usage of the end-user's username and password, and any request is authenticated and checked for privileges before being granted.

Direct File Download

The easiest and most popular direct file access mechanism with cyn.in is direct file download. Any file in cyn.in can be accessed by using HTTP (or HTTPS, if it is set up) BASIC authentication. In cases where clients do not provide login and password input boxes, HTTP BASIC authentication via URL encoding is also known to work in most cases.

HTTP BASIC authentication follows this access pattern:

  1. Client issues URL request.
    Example URL: http://demo.cyn.in/root/marketing-space/Customer-Satisfaction-Survey-2008.pdf
  2. cyn.in server issues HTTP BASIC Authentication Challenge
  3. Client isses URL request again along with username and password authentication
  4. cyn.in server checks privileges and provides download

In cases where username and password entry is not possible, direct username and password can be encoded as per the following scheme:

http://username:password@resourceurl

Example URL: http://boss:password@demo.cyn.in/root/marketing-space/Customer-Satisfaction-Survey-2008.pdf

 

 

In the above example, the first url (In point 1) will redirect to a login page, whereas the second url will be served. This can be easily seen to function well in a command line client like wget where the username and password embellished URL can be used to directly download any file from cyn.in as below:

wget http://username:password@cynin_resource_url

Direct File Upload

A file can directly be uploaded to cyn.in by using the W3C standards compliant HTTP PUT mechanism. A PUT request for on any Space in cyn.in with a file causes the file to be added directly, as long as HTTP BASIC authentication is correctly used.

This can be easily used to provide bulk data migration facilities with any capable client or custom scripts.

The easiest example of this is the curl command line client which is available on most operating systems including Linux, Mac and Windows. Using curl a simple command line like following is sufficient to upload a file:

curl -T <filename> http://username:password@cynin_space_url

WebDAV

cyn.in server provides a capable WebDAV based file server view of all resources within it. Any cyn.in site can be easily mounted as a WebDAV drive and then drag and drop operations can be carried out in a natural way with full support for clipboard copy paste as well as drag and drop support.

Non-file resources like wiki pages, blog entries and web links are also available on the WebDAV interface and can be downloaded for transfer to other cyn.in sites or decoded to be used in other systems.

FTP

cyn.in server can also provide an FTP interface where files and non-file data are exposed alike, over FTP. Data can be bulk downloaded and uploaded using any capable FTP client.

RSS

With cyn.in 3.0 RSS feeds are available at all Spaces with full support for enclosures and detailed support of metadata like tags, author and update dates. This can be used by any RSS feed reader client to consume data for both viewing as well as for downloading of content and files.

ATOM and podcast feeds

cyn.in 3.0 provides ATOM and iTunes feeds as well, which can be used in any capable feed reader client for consuming content and file data. Podcast feeds can be used to consume and publish audio and video to any external system directly from within cyn.in.

REST API

Along with standard RSS and ATOM feed mechanism, cyn.in server also provides a REST API for reading any data from within it. Any Space within cyn.in can be queries with parameters of:

  • Content Types: Filtering of individual content types (Blog posts, wiki pages, images, video, audio, discussions and so on) with single and multiple data types combinations
  • Tags: Filtering by one or more tags
  • Modified date range: Filtering by start and end date of modification of items
  • Modifier: Filtering by creating and editing users
  • Search text: Filtering of results directly by matching search text relevance

Matched items are returned in XML format (RSS, ATOM) and custom feeds are easily possible. Parameters are passed encoded directly into the URL and thus provide an easy mechanism for fetching always updated data.

XML-RPC API

A comprehensive XML-RPC API is available for direct method-based integration with other services. The cyn.in desktop client uses this API for consuming data from the cyn.in server.

Current methods include:

  • Reading of data of any cyn.in content type
  • Recent update lists of items by date, users, and so on
  • Search
  • Comments discussion
  • Administrative creation of users
  • Many more functions, new methods can be added easily

Conclusion

cyn.in server offers best-of-breed integration and data API for both getting data and files into and out of itself. Many mechanisms are available, and more integration possibilites are constantly arising. Since data can be exchanged transparently with cyn.in server directly, there is no scope of having vendor or data lock in, and cyn.in thus is the ideal way to manage storage and retrieval of files and content data.

 

Description
A summary of why cyn.in data is stored in ZODB, why it is better that way, and description of different data transfer mechanisms both in and out of cyn.in that are available today.
Comments (3)
rsprenkle Aug 04, 2010 04:49 AM
Excellent explanation. WRT External File Storage - it would be nice to have a link to further details. A desired feature would be an ability to convert to external file storage, and keep files in a filesystem tree that patterned the space hierarchy. Even better would be an external indexing engine that can run and provide google-like query of just file data
sgalaviz Mar 16, 2011 04:52 AM
What if you need to know about your users not files? Is it posible to know if a user has been login on the site? When was the first user's access? When was the last activity about a particular user?
shage1966 May 15, 2011 12:46 AM
Would love some real documentation on the REST api; not just a mention. REST has clearly taken a dominant place in web API styles; no way I know of to implement XML-RPC on an iPhone using monotouch either.
 
Loading