Blogs

Higher Education: Financial Aid

Higher education financing is typically provided through various sources. The cost of attendance is rarely paid entirely out of pocket and rarely covered entirely by federal grants.

Most students attending higher education institutions receive grants and loans from multiple streams, including the federal government, private banks, state grant agencies, and the school’s endowment fund. Federal loans and grants have aggregate loan limits that may span multiple years and multiple schools for a student.

A network of institutions and funding agencies utilizing the Thebes infrastructure for internal and external communications can transfer loan and grant information from one school to another when a student transfers. It can also handle the growing scenario of a student enrolled at multiple schools simultaneously.

When multiple funding agencies are connected, students will be able to submit funding requests once and have the requests routed to all applicable agencies. This will increase visibility of smaller specialized funds.

Actors:

Financial Aid Administrators: School Financial Aid Administrators (FAA) are authorized to view an enrolling student’s loan history, however they do not have credentials at other previously attended schools. Each FAA receives authorization from their school, and the schools will form a federation to accept credentials via SAML assertions across institutions.

Students/borrowers: It is a cumbersome process to ensure that the best aid is awarded and received by the schools. By implementing a network with Thebes, the student can be assured that aid is awarded correctly and any errors or over-awards can be corrected with relatively little burden on the student. Not only does the student benefit from single sign-on, they also receive the benefits of a single-submit system that will reach all lenders and granting agencies on the system.

Lenders/state grant agencies/US Department of Education/private funds: By involving the lenders and private, state and federal agencies, the lifecycle of the awards are consistent for the student. If a student re-enrolls in courses, the lenders can automatically know to give an in-school deferment. When the student finishes schools, all parties to the awards can be notified and repayment can begin when appropriate. Should a borrower default on loans, the information gained from other participants in the network can be used for skip tracing and finding the most current address and contact information for a borrower. If a student loses eligibility for funding, agencies will be informed.

Higher Education: Transcripts

Each student applying to a postsecondary institution must provide a transcript of courses completed. In years past, the transcripts were paper sent by the high school to each school the student applied to. More often than not, today the transcript is electronic, but not always transferred in a standards compliant or automated way. Each school manually sends an electronic image of the student’s transcript. But rarely is this information available in a machine readable format. Work is being done to create a standard format for these transcripts. The Thebes network can be used to locate the transcript information from high schools and negotiate the transfer to the higher education institutions automatically when the student applies. Policies would enforce the registrar’s attributes and student approval prior to release of the transcripts. As a part of the application process, the higher education institution’s systems can automatically request information from the high school system of record and accept the transcript into the admissions process.

Additionally, when students transfer from one school to another, a certified transcript must be sent. The authentication and authorization processes built into Thebes can ensure the transcript request is valid and authorized and that the transcript response is the official transcript and is not modified by anyone.

As with other database examples, this mechanism can improve data quality by keeping as close to the source as possible, and make data access near real-time by replacing batch transaction with instant searches for data across all schools the student has attended.

Actors:

Students: By using the Thebes software and network, the students may be mostly removed from the process of sending and verifying transcripts. The student will log in using credentials assigned when first enrolled, and list the schools the transcripts should be made available to. Once the student consents to having the transcripts made available to a list of schools, the rest of the process can continue without the intervention of the student. This eases the application and the admissions process for the students. Students can check to see which schools have accessed their records.

Admissions Office: Each admissions office spends thousands of hours per year getting and processing high school transcripts from every student that applies. By joining the Thebes network, the process of retrieving the transcript data can be completely automated. Collecting standardized machine readable transcripts will allow for automated comparisons of students. This allows the admissions officer to spend their valuable time making the admissions decisions and not wasting time collecting data.

High School System Administrators: Each high school or school district will join the Thebes network in order to have their student’s information sent automatically to the higher education institutions. The same system will allow the local school districts access to the aggregated data about the performance of each individual school, teacher, and student. The biggest benefit, though, for the school district is that there is no need for separate authentication and authorization from the over 6,000 postsecondary schools across the nation.

School districts: There is considerable and growing pressure on school districts to account for the quality of education provided. Standardized machine readable transcript formats combined with the ability to search all schools in the system with a single sign-on will provide instant and comprehensive data studies. The ability to collect individual and aggregated data about student, school, and teacher performance and compare this data against test scores will empower school districts to take positive action to concentrate on troubled schools.

Local, state, and national education officials: The ability to instantly access highly detailed education data creates a new level of oversight of the entire educational process. This satisfies federal mandates.

Medical Research Data

Background: Medical research data is often de-identified, and does not fall under HIPAA restrictions. This is not universally true, and it is critical to monitor the data this system is sharing for potential HIPAA violations. This is not meant to imply that the Thebes infrastructure is not usable under HIPAA rules, rather to state that federal regulations will require a much more stringent set of rules on patient identifiable data, and it may never be that this data can be readily shared across enterprises.

However, de-identified research data is already being shared in the scientific community. The largest and best funded effort in this arena is the cancer BioInformatics Grid (caBIG) , sponsored and created by the National Cancer Institute (NCI) in the Unites States. After spending billions of dollars to create the finest collection of bioinformatics data every generated anywhere on the planet, they realized that they had created wonderful data silos that could not be readily spanned. This is not a condemnation of NCI, in fact NCI was the first major international player in the data grid arena, and has contributed an incredible effort to the development of concepts, standards, and development. NCI took a firm policy against re-creating things that existed elsewhere; this policy has elevated existing development while NCI only funded creation of software they needed to connect existing tools together.

One of the first things NCI did was create an enterprise vocabulary. This model is one that has been referred to elsewhere in this paper. It was realized early on that if every data element in any database connected to the grid was not clearly defined and published, there would never be a way to conduct queries across the disparate databases. Some concurrent efforts in Europe opted to create translation tools to put in front of each database to solve this problem. It seems clear that whenever possible the creation of an enterprise vocabulary is a more efficient way to create a data grid, particularly when working with a clean slate. Translation programs only make sense when working with legacy databases that would prove to be more difficult to replace then translate.

Of course, the NCI Enterprise Vocabulary is open source and freely available. It shows no sign of going anywhere. Those considering creating a medical research data system would be well served to study and potentially expand or adopt their work. No data grid can be sustained for long without an enterprise vocabulary.

In a top-down environment, the creation of a data vocabulary is no more difficult then it would be when creating a single database. In other cases, there may be a lengthy and rather painful process to agree upon this vocabulary. Either way, the results are well worth the effort.

Once the vocabulary is in place, the Thebes infrastructure can be used to ease the complexities involved in the actual sharing of the research data. Resource discovery tools facilitate finding databases. Attribute based access to databases with appropriate policy limitations opens the door to searching of data.

Actors:

Developers: Most of the development efforts are outside the scope of a Thebes use case document, however it should be documented that a Thebes infrastructure combined with an enterprise vocabulary would allow greater ease of development of research databases. Knowing that the tools exist for secure sharing of data, database discovery, and policy creation and enforcement, and that there is a guarantee that any term used to identify a data point will be well understood and widely accepted will allow developers to create rich databases and detailed data discovery mechanisms to service these databases.

Systems administrators: Administrators at each research facility will connect the identity provider to the local identity store, install the various databases, and connect these databases to one or more nearby resource discovery nodes. Each researcher needs custom client software that plugs into the Thebes infrastructure. Authorization to the databases will be accomplished via the Thebes plug-in for both data entry and distributed queries.

Researchers: Whether the researcher is going to enter data or perform queries on local or distributed data, they will use a custom database access client to perform their work. One element of this client will be the Thebes plug-in, allowing the researcher to assert their attributes to gain appropriate access, for example write access to certain local tables, and read access to other local and certain distributed tables. There will be cases where a researcher’s credentials will gain write access to a remote collaborator’s database, using the same sign-on.

Local Management and Senior Staff: In this model, upper management is relieved of any responsibilities for creating data schemas, as the enterprise vocabulary will be an industry-wide creation rather then a local creation. In theory, there will be a process of deciding whether or not to accept the vocabulary, but failure to do so would amount to intellectual disbarment from the domain specific community. Management will continue to be involved in the decision of what to share and who to share with, although even some of those decision may be beyond local control as funding agencies wield their control and researchers demand certain connections be made with peer’s institutions. Some data will have a financial value attached to it, and accounting facilities will be attached to the policy controls, so proper invoicing can take place. This system facilitates the accounting for usage of both resource time and data consumption.

Retail Point of Sale and Data Usage

This example is a large chain of small auto service shops. Each shop has approximately 20 employees, a small network with a local point of sale system writing to a local database. Each shop maintains its own security and identity store, under rules and oversight from the corporate offices. There are senior employees at each shop with permissions to write to the financial portion of the database, all employees are able to check in cars, detail what has been done at each visit, and review previous work.

Historically, there may be a single database under corporate control with each store writing to this database periodically. Or, there may be no communications between each store and the central office. Assuming some sort of routine data transfer from stores to home office, employees would need a second set of credentials to connect to the corporate database. Data would be uploaded to the corporate office from each database either in near real time or in batches.

Rather then maintaining a monolithic database, Thebes allows for a widely distributed solution. Each shop maintains it’s own identity store and a local copy of the database. Data entry is done locally and data is stored locally. Data is never transferred to the home office; instead it is retrieved in real time when needed via searches that can span the entire system. It may be useful to keep multiple copies of each shop’s data at one or more other shop to improve data access speed and protect from a single node dropping out. Data transfer would only be of data relevant to a query, the results can be saved or not depending upon the needs of the requester.

Actors:

Systems administrators: Administrators at each shop will connect the identity provider to the local identity store, install the local instance of the database, and connect the database to the nearest resource discovery node. Each client computer at each shop and corporate offices needs custom client software that plugs into the Thebes infrastructure. Authorization to the database will be accomplished via the Thebes plug-in. This is equally true of local users and remote queries.

Technicians: The junior technician logs in to the database using the same username and password pair that they use to connect to corporate email and the local area network. The level of access the technician receives will be based upon the attributes in their LDAP entry. When a customer arrives at the shop, the technician will collect a minimum set of information about the vehicle. Possible data collection points might be VIN number, license plate number, or owner name and phone number. This can even be accomplished on a hand-held device. If the vehicle has a local history, the local details will be immediately displayed, a search of the data grid will immediately commence. All work done by that company at any location to that vehicle will be returned to the technician’s workstation. Remote databases are located dynamically via the resource discovery network. The technician uses this data to make service recommendations to the customer.

As work is completed, it is entered into the database. Because this is local data, the response time is fast. It is also available in real time to senior management for evaluation.

When the work is completed, a senior technician/manager logs in to check the customer out. The senior manager has additional attributes in the LDAP directory. These attributes automatically grant access to make entries in the financial area of the system. The work performed that session is displayed. If there are warranty issues, the senior technician will have the necessary data available, no matter where the previous work was performed.

Customers: When a customer is checked in, the entire history of the vehicle (in that company’s shops) is available to the technician. The technician will check the customer in to the local, comparatively lightweight database, at local area network speeds. The customer will receive a quick high quality experience. In the background, the system will recover any work done to that vehicle anywhere in the network. At the end of the visit, a senior technician will handle checkout and financial transactions based upon that technician’s higher level of access.

This also gives the customer other opportunities. There are a variety of reasons a customer may want access to data about work done on a vehicle. Individual owners may want to use this system to track maintenance to their vehicle and plan for upcoming routine maintenance expenses. Corporate fleet managers may want detailed reports about the maintenance done to their vehicle. This is possible via a web portal interface to the database system. Usernames and passwords would be assigned when a customer is actually in the shop, giving a higher level of assurance to the security mechanism. This is a possible sales tool for this company, and will ensure customer loyalty, just because it can provide a single source for maintenance records.

Local Management: Local managers will have access in real time to all transactions being handled in their shop. This access will be available using the same client and authorization mechanism the other players have, but their attributes grant them a higher level of access. The data available to local management will be real time to the level of being to study how many cars are in the shop at any given moment.

Senior Staff: Prior to the installation of the system, senior management will have decisions to make. A single data structure will be created. This is executed in a standardized database that all shops in the system will use. A Thebes enabled client to the database system will allow senior staff to collect and analyze data from all distributed databases via queries to the data grid. Once these decisions are made and the system is installed, management will have immediate, real time, access to all details of the entire system and each individual shop in the system. The level of detail will be as rich as the design of the database will allow.

Exciting progress in Thebes; new ways to look at distributed computing

So, Tim Bornholtz is posting his most recent additions to the service interface and client user interface projects this week. In his service interface, he has managed to implement nearly the entire DRMAA specification to JSDL, allowing JSDL messages to arrive, be parsed to DRMAA, and executed. For now, he has built against Sun Grid Engine, but migrating to Condor, PBS, or Platform LSF should be a matter of building against the different libraries. Additionally, he has adapted Virginia Tech's interesting JSDL generator to become the core of a job submission tool.

So, for now, a user can create a job and submit it to a known compute service.

We're beginning a highly rudimentary start to a security token service. This means we're building something that can accept a username and password and return SAML. This is using web services instead of the web user mechanisms of Shibboleth. Eventually it would be nice to add SAML to SAML, and anything to Kerberos, etc, but username to SAML is what we need today.

We met with the PVFS folks at Clemson. Fascinating stuff coming down the road, and an excellent fit with our interest in SAML. We're looking at not only doing the authorization by filtering from our SAML assertion, but now it could be possible to populate file metadata from the same assertion. This is powerful stuff, not just in HPC and Grid, but in enterprise file systems and preservation and archiving data. A real break from the UNIX file system in form and function.

It is almost time to start looking at resource descriptions and resource discovery. Well, it's already past time, but we don't have the volunteers. GLUE2? Extensions of JSDL? Something else? Tim has posted a schema based upon JSDL. So far no comments. One thing is clear, monolithic resource indexing is a non-starter, we need something with a peer-to-peer and hierarchical architecture. Of course, once this is built, it can also be used well outside the HPC/grid arena. This sort of architecture can answer any version of "who is publishing foo for me to see?"

Finally, SWITCH is publishing some interesting work in policy administration, generation, and enforcement. This is going to be nice, their approach will allow for plugging directly into our service interface.

If any of this strikes your fancy, contact me. We need good volunteers (or money to hire good coders). Or good ideas. Or even bad ideas we can argue about.

Arnie Miles

The Grid is Dead: Final repost of missing GridsWatch articles

This is the final repost of missing GridsWatch postings:

So, it seems to me, imho, that the closely held definition of the word grid by the keepers of truth in the grid world, combined with the ham-handed misuse of the term by vendors, has pretty much killed "Grid". At least the word, but not the concept.

I don't find it surprising. The original software was built to satisfy a specific use case. A small group of well funded big science academic researchers gobbling up all the resources they can lay their hands on. Some papers and a couple of big books later, we have a highly complex system that only a developer could love. But...the word got buzz, it was exciting, excellent analogies were built (power grid, transit grid), stories grew up around the concept. So corporations got curious and started asking, and vendors starting writing their own definitions.

The most successful grids in the world aren't. They emulate security by limiting how many scientists can ask questions. The rules are that the scientist has to have altruistic needs and a good story. If the story is good enough, these grids can attract joe and jill computer owners to connect. GU is the 115th largest team on World Community Grid based upon a 5 point extra credit assignment given to the Intro to CS for non-majors class.

It's time to evaluate the use cases and build something valuable. A computational Internet. A place where owners of resources can allow groups of researchers to run applications safely. If you want your computer to be used to help cure cancer, you should have the power to give ALL cancer researchers access.

All it takes is a way to prove a person is a cancer researcher. Resource owners don't need to know each researcher's name, they just need to know where cancer research is being performed. Identity providers at each research center can handle proving whether or not a person is a cancer researcher.

SAML provides the tools to do this, but tacking SAML solutions onto existing grid solutions adds more complexity. Let's see about making job schedulers, file systems access, sensors, and license managers SAML aware. Let's see about building a simple to join consortium of resources that publishes resources to be discovered by a resource discovery network made available to all researchers.

Let's make whatever replaces 'grid' look like the Internet. Even the playing field. Rip the grid out of the hands of big science and make shared resources available to everyone.

Arnie Miles

Syndicate content