Student Records Data Warehouse Project Notes of Meeting 14 May 2003 Present: Ray Burnley, Sue Milward ------------------------------------------------------------------------ A meeting was held on 28 March following a request for access to student records data by some phase 4 schools. It was agreed that a short-term (approx 1 year) solution was required to provide data for Maths and SHiPSS. It was also noted that this data would need to be available by Sept 2003 and would be need to be available in some form of 'data warehouse’. Actions from the meeting were: * Sue to identify resource to carry out the work * Ray to produce a specification of data fields required by 9 May * Ray/Sue to identify suitable server and format to hold the data Items discussed at this meeting were: RESOURCE * It is understood that Ray is still formally assigned 40% to work for Admin Computing, although this resource has not been called upon for some time. * Discussions between IT management and Ray suggest that his time may well be fully occupied on other projects from Summer 2003 onwards * It is assumed that until that time, only 40% of Ray’s time would be available for Admin Computing tasks DATA REQUIREMENTS * Ray has still to contact Maths and SHiPSS to produce a list of data fields required in the warehouse and also to investigate their views on suitable ways to present the data. Ray also to estimate the size of the total dataset. * Following the initial specification of the data needs, Ray has added a request from all SAM schools for production of teaching/tutorial group timetables. This may require extra derived data in the warehouse and would have implications for rolling out the warehouse across more schools than anticipated. Sue to check this requirement with Emma and ask Emma to produce definitive specification for the uses of the warehouse. DATA FORMATS Four different ways of presenting the data were discussed based on the following assumptions: * Security. It will be the responsibility of the developers of external systems to ensure that they do not publish or release any data not relevant to themselves * The developers of the external systems will adapt their own processes to extract data from the warehouse, ie IT Services will be providers of data only * Warehouse data will be refreshed on a daily basis by IT Services. POSSIBLE DATA FORMATS 1) CSV format. SQL scripts would be written to extract relevant data into flat CSV file(s). These would be made available on an appropriate server. Pros: * Very simple to use * No end-user training required Cons: * If the datasets are large, there could be performance implications 2) Oracle database. Data would be extracted using SQL into a separate Oracle instance (or a separate schema within the SITS live instance) Pros: * Extensive expertise exists within Admin Computing to maintain such a system with minimum resource * Data stored thus would be able to exploit full database features to maintain ultimate performance Cons: * If the extra database was stored on the main database server Apollo, there could be performance implications. Ideally an extra Unix server (cost approx £1K) would be used 3) Continue to use SAM, but in read-only mode. Ray’s SAM system would be maintained with data extracts from SITS running as previously Pros: * The software SAM already exists * SHiPSS are already extracting data successfully from SAM Cons: * Maintenance of a complex system with potentially no support available 4) Use XML. XSU scripts would be used to extract 'flat’ files in XML format. XSLT scripts would then be required to extract required data Pros: * Use of new technology would be very interesting and may prove useful for future projects Cons: * Use of unknown new technology with no expertise in Admin Computing, or anywhere within the University * Developers of external systems would be responsible for extracting data from XML files. Training may be required. Following further internal discussions within Admin Computing, it is felt that the most sensible option to use at this time is that of an Oracle data warehouse. However there will be a need for a suitable server to run the warehouse. It is therefore proposed that subject to allocation of resource, server and agreement of the potential users then development work should continue as soon as possible. Sue Milward, 19-May-2003