Open:FactSet Forum

Data Exploration and the Cloud

FactSet is relatively new to external cloud computing. Historically, we have always done things within our own datacenters and with our own in-house built software. That has changed dramatically in the past few years as we embrace third party cloud providers such as Microsoft and open sourced software.

Data Exploration is a tool FactSet developed initially with the idea of providing an isolated SQL Server instance for potential clients to trial the myriad of Data Feeds that FactSet has to offer, whether that be our data or our third party providers’. A solution such as Data Exploration is difficult to scale and resource on your own. Building such a product requires planning for CPU, memory, redundancy, isolation of client processes, and any number of other similar technical hurdles.

The Microsoft Azure cloud has allowed us to minimize the focus on those aspects of scale and instead allowed us to put our efforts into building the business logic of the product we were interested in. An environment is spun up, everything available for purchase is loaded into it, and users can then peruse and test their production models against this data. We leverage Citrix and provide a variety of data analysis and visualization tools such as Python, Jupyter Notebooks, R Studio, Matlab, and Tableau. Now we can easily spin up 1, 10, 50, or 100 instances of Data Exploration all in the matter of a few short hours. This has significantly decreased the time it takes us to get our clients to trial on the FactSet Standard Data Feeds, getting users access to the data in a matter of hours instead of weeks or even months.

Azure’s managed Kubernetes service has allowed us to build a centralized infrastructure to run the FactSet Loader. Kubernetes takes care of managing compute resource and storage and scheduling the containerized loader instances so we can execute many loaders simultaneously. We’re also currently investigating various storage mechanisms such as standard network disks, blob storage, or other NoSQL databases in order to provide our Transcripts and StreetAccount data for efficient use within the client instances.

The next step for us is to turn Data Exploration into a true potential piece of production infrastructure for our clients interested in the data FactSet offers, but not interested in managing the resources or the loading processes. This can be accomplished by us further utilizing what Azure offers us. This includes multiple different regions of location, backup offerings, and providing for the elasticity of compute resources such as CPU or Memory. Allowing our clients to interact with a database as if it is on their network, but without the hassle of having to manage infrastructure or the process of keeping the data updated, will be a game changer. Coupling that with the fact that all data sold through Open:FactSet and available in Data Exploration is concorded back to FactSet’s central symbology means that users can go from literally having no infrastructure whatsoever to a scalable, elastic environment with all the necessary data available for use in a matter of hours.

As we continue to build out this offering, we encourage our clients to take Data Exploration for a trial spin and provide us feedback so that we can continue to improve on the product.