NavigationMember Login |
Cloud ComputingThere are too many definitions of cloud computing to clearly address them all. However, it is apparent that attribute based security addresses most variants. In the public cloud, the enterprise can use this infrastructure to control the creation of and access to virtual machines in the cloud. This controls costs, gives the enterprise an opportunity to exert control over security of machines that represent it, and allows for centralized oversight of what are essentially the enterprise’s resources. In high performance computing examples, this infrastructure can allow for the creation and destruction of virtual machines to attach to a local high performance cluster, either manually or automatically based upon load. In the enterprise cloud, using virtual machines that have the proper client software installed eliminates the need for separate security implementations on each virtual machine. For example, leveraging attribute-based infrastructure allows for instantiating 1 or 100 or 10000 virtual machines can make provisioning security a matter of implementing policies across the machines, with all authentication issues already handled.
Application and License SharingApplication and license sharing is sometimes called an application grid. This class of examples has many different variations. Once a plug-in allows access to an application or license server based upon supplied attribute assertions, license management and application load sharing become much simpler. If an organization has a certain number of licenses for an application, these are often controlled with a license server, where the application goes to look for available licenses. No matter how many instances of an application there are, only a certain number can be in use at any given time. Attribute based authorization to a license server can facilitate access to available licenses. Applications can be included in basic install images, available on every desktop, with users checking out a license for local use by offering their assertion to the license server. This has applications in cloud computing, allowing large numbers of virtual machines in the cloud with the enterprise’s entire suite of offerings available. This arrangement allows the enterprise to not only control access to licenses, but also offers load balancing via the cloud. Combining this power with the ability to use local file systems with the same credentials will then improve security. A user can sign on once, gain access to a virtual machine with the required application on it, access a license for this application, and store work on a local file system, all with a single sign-on. Applications that maintain their own licensing controls can still benefit from this infrastructure, by leveraging policy controls and application plug-ins. Whether the application is installed on the user’s machine or shared on a larger server, the attributes offered to the application will be verified against the policies in place for the application. This provides the same level of license and load balancing as the previous example. Finally, for applications in high demand, enterprises can take advantage of batch scheduling software commonly used in the high performance computing world to queue work and control load on each individual resource. Since the plug-ins for batch schedulers already exist, this solution can make access to expensive and popular resources more equitably available.
Distributed High Performance ComputingBackground: Distributed high performance computing was the initial impetus for the Thebes consortium, and continues to be the most developed. In this model we add the complexity of job schedulers at each high performance compute (HPC) devices as well as the transport of data sets in and out of the HPC devices. Users locate available resources via the resource discovery network, but in the HPC example there is a dynamic aspect to the metadata in the RDN, as how busy a resource is plays a key component in the decision to use a resource or not. The Thebes service installed on the resources will filter SAML (Cantor, Hodges, Morgan) assertions, check them against the policy enforcement point, and pass appropriate work to the local job submission tool. Actors: Systems administrators: Administrators at each enterprise will connect the identity provider to the local identity store and install a local resource discovery network node. This node will be introduced to one or more external RDN nodes. They will also optionally establish an enterprise level policy administration point. As resource administrators connect compute or file system resources to the network, they will install the Thebes service on each resource, create policies, and publish their resource to the RDN. Each client computer at each shop and corporate offices needs custom client software that plugs into the Thebes infrastructure. User authentication will be accomplished via the Thebes plug-in. This is equally true of local users and remote queries. Researchers: The submit tool for Thebes is a simple Java installation, and will accept a username and password and perform the necessary work to obtain a signed assertion from the enterprise identity provider. Additionally, this tool will accept from the user a detailed description of the job in a format that is well understood by the popular job schedulers, as well as all the requirements of the job. When the researcher submits the job, it is sent to a high level scheduler that continuously collects dynamic data from all HPC resources known to the resource discovery network. It can either return an ordered set of resources to the user to choose from, or it can automatically select the most appropriate resource to submit the work to. Local Management: In this case, local management can represent the various division and departments heads that lie between the upper management and the researcher. In some cases, it may be appropriate for these positions to assign policies to the resources that fall in their domain that are more stringent then the overall enterprise policies. In some cases, this layer of policy control might relieve the resource owners of the need for additional policies. If the system is set up to collect accounting information, management can use the data collected to cost share or invoice for computational time, or to justify expenditures to funding agencies. Senior Staff: If Thebes is going to be used to cross administrative domains, there may need to be senior staff buy-in and participation to protect local interests and satisfy legal requirements. Generally, sharing resources will require agreements between each institution involved in the exchange, with expectations, limitations, responsibilities and requirements spelled out. Once this is in place, the policies agreed upon will have to be codified in the policy administration tools, which will represent the minimum set of restrictions that comply with the agreements.
Higher Education: Financial AidHigher education financing is typically provided through various sources. The cost of attendance is rarely paid entirely out of pocket and rarely covered entirely by federal grants. Most students attending higher education institutions receive grants and loans from multiple streams, including the federal government, private banks, state grant agencies, and the school’s endowment fund. Federal loans and grants have aggregate loan limits that may span multiple years and multiple schools for a student. A network of institutions and funding agencies utilizing the Thebes infrastructure for internal and external communications can transfer loan and grant information from one school to another when a student transfers. It can also handle the growing scenario of a student enrolled at multiple schools simultaneously. When multiple funding agencies are connected, students will be able to submit funding requests once and have the requests routed to all applicable agencies. This will increase visibility of smaller specialized funds. Actors: Financial Aid Administrators: School Financial Aid Administrators (FAA) are authorized to view an enrolling student’s loan history, however they do not have credentials at other previously attended schools. Each FAA receives authorization from their school, and the schools will form a federation to accept credentials via SAML assertions across institutions. Students/borrowers: It is a cumbersome process to ensure that the best aid is awarded and received by the schools. By implementing a network with Thebes, the student can be assured that aid is awarded correctly and any errors or over-awards can be corrected with relatively little burden on the student. Not only does the student benefit from single sign-on, they also receive the benefits of a single-submit system that will reach all lenders and granting agencies on the system. Lenders/state grant agencies/US Department of Education/private funds: By involving the lenders and private, state and federal agencies, the lifecycle of the awards are consistent for the student. If a student re-enrolls in courses, the lenders can automatically know to give an in-school deferment. When the student finishes schools, all parties to the awards can be notified and repayment can begin when appropriate. Should a borrower default on loans, the information gained from other participants in the network can be used for skip tracing and finding the most current address and contact information for a borrower. If a student loses eligibility for funding, agencies will be informed.
Higher Education: TranscriptsEach student applying to a postsecondary institution must provide a transcript of courses completed. In years past, the transcripts were paper sent by the high school to each school the student applied to. More often than not, today the transcript is electronic, but not always transferred in a standards compliant or automated way. Each school manually sends an electronic image of the student’s transcript. But rarely is this information available in a machine readable format. Work is being done to create a standard format for these transcripts. The Thebes network can be used to locate the transcript information from high schools and negotiate the transfer to the higher education institutions automatically when the student applies. Policies would enforce the registrar’s attributes and student approval prior to release of the transcripts. As a part of the application process, the higher education institution’s systems can automatically request information from the high school system of record and accept the transcript into the admissions process. Additionally, when students transfer from one school to another, a certified transcript must be sent. The authentication and authorization processes built into Thebes can ensure the transcript request is valid and authorized and that the transcript response is the official transcript and is not modified by anyone. As with other database examples, this mechanism can improve data quality by keeping as close to the source as possible, and make data access near real-time by replacing batch transaction with instant searches for data across all schools the student has attended. Actors: Students: By using the Thebes software and network, the students may be mostly removed from the process of sending and verifying transcripts. The student will log in using credentials assigned when first enrolled, and list the schools the transcripts should be made available to. Once the student consents to having the transcripts made available to a list of schools, the rest of the process can continue without the intervention of the student. This eases the application and the admissions process for the students. Students can check to see which schools have accessed their records. Admissions Office: Each admissions office spends thousands of hours per year getting and processing high school transcripts from every student that applies. By joining the Thebes network, the process of retrieving the transcript data can be completely automated. Collecting standardized machine readable transcripts will allow for automated comparisons of students. This allows the admissions officer to spend their valuable time making the admissions decisions and not wasting time collecting data. High School System Administrators: Each high school or school district will join the Thebes network in order to have their student’s information sent automatically to the higher education institutions. The same system will allow the local school districts access to the aggregated data about the performance of each individual school, teacher, and student. The biggest benefit, though, for the school district is that there is no need for separate authentication and authorization from the over 6,000 postsecondary schools across the nation. School districts: There is considerable and growing pressure on school districts to account for the quality of education provided. Standardized machine readable transcript formats combined with the ability to search all schools in the system with a single sign-on will provide instant and comprehensive data studies. The ability to collect individual and aggregated data about student, school, and teacher performance and compare this data against test scores will empower school districts to take positive action to concentrate on troubled schools. Local, state, and national education officials: The ability to instantly access highly detailed education data creates a new level of oversight of the entire educational process. This satisfies federal mandates.
Medical Research DataBackground: Medical research data is often de-identified, and does not fall under HIPAA restrictions. This is not universally true, and it is critical to monitor the data this system is sharing for potential HIPAA violations. This is not meant to imply that the Thebes infrastructure is not usable under HIPAA rules, rather to state that federal regulations will require a much more stringent set of rules on patient identifiable data, and it may never be that this data can be readily shared across enterprises. However, de-identified research data is already being shared in the scientific community. The largest and best funded effort in this arena is the cancer BioInformatics Grid (caBIG) , sponsored and created by the National Cancer Institute (NCI) in the Unites States. After spending billions of dollars to create the finest collection of bioinformatics data every generated anywhere on the planet, they realized that they had created wonderful data silos that could not be readily spanned. This is not a condemnation of NCI, in fact NCI was the first major international player in the data grid arena, and has contributed an incredible effort to the development of concepts, standards, and development. NCI took a firm policy against re-creating things that existed elsewhere; this policy has elevated existing development while NCI only funded creation of software they needed to connect existing tools together. One of the first things NCI did was create an enterprise vocabulary. This model is one that has been referred to elsewhere in this paper. It was realized early on that if every data element in any database connected to the grid was not clearly defined and published, there would never be a way to conduct queries across the disparate databases. Some concurrent efforts in Europe opted to create translation tools to put in front of each database to solve this problem. It seems clear that whenever possible the creation of an enterprise vocabulary is a more efficient way to create a data grid, particularly when working with a clean slate. Translation programs only make sense when working with legacy databases that would prove to be more difficult to replace then translate. Of course, the NCI Enterprise Vocabulary is open source and freely available. It shows no sign of going anywhere. Those considering creating a medical research data system would be well served to study and potentially expand or adopt their work. No data grid can be sustained for long without an enterprise vocabulary. In a top-down environment, the creation of a data vocabulary is no more difficult then it would be when creating a single database. In other cases, there may be a lengthy and rather painful process to agree upon this vocabulary. Either way, the results are well worth the effort. Once the vocabulary is in place, the Thebes infrastructure can be used to ease the complexities involved in the actual sharing of the research data. Resource discovery tools facilitate finding databases. Attribute based access to databases with appropriate policy limitations opens the door to searching of data. Actors: Developers: Most of the development efforts are outside the scope of a Thebes use case document, however it should be documented that a Thebes infrastructure combined with an enterprise vocabulary would allow greater ease of development of research databases. Knowing that the tools exist for secure sharing of data, database discovery, and policy creation and enforcement, and that there is a guarantee that any term used to identify a data point will be well understood and widely accepted will allow developers to create rich databases and detailed data discovery mechanisms to service these databases. Systems administrators: Administrators at each research facility will connect the identity provider to the local identity store, install the various databases, and connect these databases to one or more nearby resource discovery nodes. Each researcher needs custom client software that plugs into the Thebes infrastructure. Authorization to the databases will be accomplished via the Thebes plug-in for both data entry and distributed queries. Researchers: Whether the researcher is going to enter data or perform queries on local or distributed data, they will use a custom database access client to perform their work. One element of this client will be the Thebes plug-in, allowing the researcher to assert their attributes to gain appropriate access, for example write access to certain local tables, and read access to other local and certain distributed tables. There will be cases where a researcher’s credentials will gain write access to a remote collaborator’s database, using the same sign-on. Local Management and Senior Staff: In this model, upper management is relieved of any responsibilities for creating data schemas, as the enterprise vocabulary will be an industry-wide creation rather then a local creation. In theory, there will be a process of deciding whether or not to accept the vocabulary, but failure to do so would amount to intellectual disbarment from the domain specific community. Management will continue to be involved in the decision of what to share and who to share with, although even some of those decision may be beyond local control as funding agencies wield their control and researchers demand certain connections be made with peer’s institutions. Some data will have a financial value attached to it, and accounting facilities will be attached to the policy controls, so proper invoicing can take place. This system facilitates the accounting for usage of both resource time and data consumption.
|