Here are some project ideas I have for PhD, MPhil, MSc (team or individual and Final Year Under-grad projects (see list below).
If you are interested in these topics and would like to discuss then get in touch. But make sure that you have performed some detailed research on the proposed topics. This will allow you come to the discussion with ideas and questions, rather than saying that “I’m interested in this topic, can you tell me more about it”.
MSc, MPhil and PhD projects (some of these projects could be adapted for under-grad projects)
|Analysing Trends in Conference Papers||Using the data from this site on best conference papers from the 25 years, can you identify any trends, insights, etc from these papers. This can be illustrated to see how trending topics have evolved over time and how these can be related to other events in the IT industry|
|MLOps / AIOps||There are variety of different possibilities for deploying models in production. These can range from containerisation, serverless functions, virtual machines, docker, etc. Additionally these can provide different delivery mechanisms, such as APIs, REST calls.
This project will example the various solutions available on AWS, GCP, Azure, Oracle Cloud to assess the ease of initial deployment, the ease of updating models in these environments, the ability to call and use the models, and benchmarking of using these models based on number of calls per second, or similar.
|DS||Using Pre-built Models for Image Classification and Knowledge Extraction||Most of the cloud providers (AWS, GCP, Oracle, Azure) are building ML/AI applications providing prebuilt models for image classification, object detection and text extraction. These are aimed at making it easier to use this technology and to allow a wider audience (beyond the Data Scientist) to use this functionality without the need to know or understand what is happening under the hood, at language, library, algorithm, parameters, training and test.
This project will look at evaluting the offerings from these different cloud vendors, to asses the functionality provided, the ease of use, the range of possible use cases, their accuracy at predicting and knowledge extraction, etc
All without the need to be a data scientists or machine learning expert.
The time of being an “expert” Data Scientist or Machine Learning expert has come to an end. Data Science and Machine Learning is no longer a specialist skill, but is not a generalist skill that people in IT, Social Sciences, Marketing, Physical Sciences, Chemical Sciences, and lots of other areas, have these skills.
|ASD||Evaluation of Linux distributions||Linux is one of the most common/popular environments on most servers, including database servers, website host, application tier, etc
This project will look at evaluating 5-6 different Linux environments (RedHat, Ubuntu, Oracle Linux, etc), such as open/free source Linux distributions. The evaluation will examine their performance for different use cases and with different workloads, to measure and compare their efficiency. You will be able to benchmark your results (and test environments) against publicly available results.
|ASD||Evaluation of JOOQ versus Hibernate and others||Hibernate is widely used but has MANY MANY problems, which usually results in writing and executing very inefficient queries, which gives the impression the Database is running slowing, but in reality the SQL code generated by Hibernate is just BAD.
An alternative is JOOQ (http://www.jooq.org/learn/). There are several other options. This project will evaluate JOOQ versus Hibernate (and others) to explore their differences, their limitations, the issues they cause and experiment to see which one really works best.
This project requires a good understand of database internals and query optimization to complete this project.
|DA & ASD||Photographic Restoration using Deep Learning||Explore and evaluate various Deep Learning algorithms and models for restoring old photography images into something that looks recent/modern.
For example have a look at this Github repos for some examples
|DA & ASD||GitHub Copilot||How good is it really?
How useful is is really?
What are the problem?
This project will perform a full evaluation of using Github Copilot for a number of applications, with comparison to code already written or written by experts, and comparison with alternative approaches
|DA & ASD||Algorithmic Trading||There are a lot of different algorithms available for trading stocks. Lots of recent research has focused more on advanced Machine Learning techniques. But there has been mixed results along with the complexity of such solutions. Other solutions looks at including simpler machine learning algorithms and including Natural Language Processing (NLP), Sentiment Analysis, etc to make predictions.
But how do these approaches really compares with more traditional approaches used for many decades, based on simple statistics of moving averages, regression, etc
In this project you will evaluate some of these to determine what works, what kinda works, what is just too complicated, etc
There are lots of possibilities for this project and each person can have their own focus and interest areas.
Open to many students working of this simultaneously.
|DA & ASD||Evaluation of Application Translation Layers to support Application migrations||For example, Babelfish is a translation layer for Amazon Aurora PostgreSQL that enables Aurora to understand commands from applications written for Microsoft SQL.
But can it work for other Databases?
What are the alternative solutions to Babelfish? Can you compare these products, what impact they have on portability, impact on application performance, etc
|DA & ASD||Evaluation the Ethical and Legal Implications of Data Mining, Machine Learning and Analytics||Over the past few years there has been a growing interest in the areas of Ethics and Legal Implications of Analytics, Data Science, Data Mining, Machine Learning etc. These two topics are very different and yet they have a large overlap.
This project will examine how the EU and other political regions have adapted their legal systems. Additionally we have seem a number of different Ethics frameworks etc being put forward. Given the nature of these two topics, the project will examine these, defining overlaps, gaps, directional changes and how state of the art research can improve current practices.
|DA & ASD||Evaluation of MLOps Frameworks||A large number of MLOps framworks exist and the list of these is constantly growing. But how good or how complete are they.
This project will define a set of evaluation criteria, based on research, and will then apply the criteria to 3-5 MLOps frameworks determine their completeness, efficiency and cross platform support, among other things.
|ASD||Voice Controlling a Database
Voice recognition & Text to speech
Building an Accessible Application
|Everyone uses SQL to access and process data in a Database. SQL is a 40+ year old language and is very commonly used in all databases, no matter their type.
This project will look at building an accessible interface to the database. Allowing people to create SQL queries using voice instructions. These instructions can have the same structure of typical SQL statement, both ANSI and for SQL implementation of the chosen database.
Additional features will examine using a markup type syntax for SQL. There are a number of example of using this, and the project will look to implement one of the most commonly used. SQL Markup syntax is a form of short hand syntax for writing SQL. The application will take instructions from the user voice commands and will construct the SQL, providing prompts and feedback to the user as necessary.
When the query results are retrieved, the application will converse with the user on how to share these results, from providing some aggregate and summary information, to playing back all or part of the query result set.
|DA & ASD||Evaluation of lite ML frameworks for deployment on IoTs, phones, tablets and other similar devices||In recent times there was been a move to deploy ML and other advanced analytics on low power computing devices.
This project will examine the various Frameworks, Libraries and supporting languages for the deployment on such devices. Careful measurement is needed to determine and evaluate the real effect of doing this and its viability.
|DA & ASD||Adding new functionality to a Database:
Can you add Data Graphing capabilities to a Database, providing similar functionality like ggplot2, and be called using SQL
|The Oracle Database allows you to extend the functionality of Database by including External Procedures (external procs). These allow you to write functions using Java and C, and have these registered in the Database, allowing these external procs to be called from the Database using SQL.
This project will look at creating functions, using external procs, to create similar functionality to ggplot2 (in R), and have this functionality accessible using SQL in your typical SELECT statements. The project will evaluate the performance of using external procs and examine the integration of this functionality with other aspects of the Database. All images produced should be in BLOB format, allowing for the querying, displaying and storing of these images in the database.
|DA||Comparison and Evaluation of Machine Learning Interchange Formats and Tools||There are a number of machine learning interchange formats including PMML, PFA, ONNX, and many more. This project will example these, developing examples to show their capabilities and weaknesses, across multiple languages. What format to use, for what kind of models, for what languages, etc all of these and many more aspects will be considered.|
|DA & ASD||Fake New Detection||Over the past few years we have been hearing a lot about fake news. The aim of this project is to explore the research in this area and to develop a fake news detection algorithm. Building upon previous research the project will select various elements and use these to apply it to a regional context (e.g. Ireland) and then to compare how it performs for other regions. By doing this you will be able to assess if there are geographic variations in how fake news is used around the world|
|DA & ASD||Are Mobile Devices suitable for Machine Learning||(almost)Everyone carries a mobile device and these contain many different applications. Recently there has been some advanced with building machine learning capabilities into these applications. Various libraries and frameworks are being made available to enable machine learning on mobile devices. But is this really possible? This project will explore the various ML solutions for mobile devices, examine the various languages/solutions and evaluate the extent of ML capabilities on mobile devices. See Firebase ML Kit for an example. Others include Apple CoreML, TensorFlow Lite, etc|
|DA & ASD||Personalized Advertisements based on Facial Recognition||Using a tablet device to deliver advertisement, monitor facial reactions of person watching to judge level of interest and attention for adverts. Using this feedback use machine learning to determine what adverts to display next. Gathering of feedback using facial recognition and user engagement build a full platform and architecture to support the delivery of solution|
|ASD||Real-time Database Monitoring tool||Build an application to monitor the database and all internal processes in real-time. To provide informative visualizations and data insights on what is happening, using various trend analysis and ML to identify anomalies and alerts. Can this tool be build to work with more than one data vendor?|
|DA & ASD||Augmented Data Analysis and Machine Learning||Build an augmented data analysis and machine learning tool. Capable of loading any data set, analyze it, understand it, visualize the data, perform data enrichment, identify feature engineering, identify possible ML algorithms to use. All done automatically, with just a click of a button from the user. All they need to do is specify the data set.|
|DA & ASD||Data Indexing using Machine Learning||Check out the paper by the Google AI team on using neural networks as an alternative to B-tree indexes. Can you build something similar, can you improve on their design, can other ML algorithms be used, how does this scale, etc. There are lots and lots of possibilities with this project|
|DA||Analysing people musical tastes||This project will look at examining the musical characteristics of persons favorite music. Taking in batches of various sizes to determine the optional number of compositions to determine a style. The music will be broken down into key components and compared across all music in the batch. A similar approach can be used to analyze how music styles have evolved over time for different musicians|
|ASD||Evaluation of Low Code development environment||In recent years a number of Low Code software development environments have evolved. This project will look at evaluating 3-5 of these to examine their features, development effort, developer skills, adoption within enterprises and how these type of low code development environments are will impact in future|
|ASD||Walk with me – for the visually impaired||This project will look at using a Raspberry Pi enabled camera to allow the visually imparted to walk down a street un-aided. The camera will constantly scan the environment, taking pictures in real time, scanning these images and then providing voice descriptions of the environment. This will allow the person to visualize their environment. When the person walks the image and data process will detect this and will feed motion related information to the user, such as certain objects are getting closer or moving away. The system will also identify potential hazards such as people, rubbish bins and other obsticals. All image and other process to be performed on a Raspberry Pi|
|DA & ASD||Real-time Anomaly Detection of Server or Database Alert logs||You need to have access to server or a database activity logs for this project. Using anomaly detection, along with variations in time-periods, identify unusual activity and provide appropriate level of notification and information about the alert.|
|DA||Syntetic data generation for imbalanced data sets||An examination of various methods for the generation of synthetic data for imbalanced data sets. Similar to the processing used in SMOTH, additional techniques will be used and evaluated to determine their effectiveness for input to machine learning|
|ASD||Twitter profile follower generator using GO||Using Google Go language, build a library for the Twitter API. Then use this library to create an application to allow a user to increase their number of followers on twitter. Various approaches should be evaluated and implemented using the newly created library. The application should identify, based on an existing user profile, how to increase their number of followers.|
|DA & ASD||Using Music and Machine Learning for Database Monitoring||Combine your love of music and machine learning to monitor Database activity. This activity will involve monitoring the database engine logging and using a combination of anomaly detection, with moving windows of data selections to identify and capture the moving trends in the activity logs. Then take this activity and compose music (based on your favorite artist or genre) as a reporting mechanism. Then unusual activity is identified then this needs to be reflected in the music.|
|ASD||Database storage 4.0- Multi storage management for next stage database||In the next phase of database management will see an integrated approach to how and where the data is stored. The past decade has seen the push by the Hadoop Eco-system to replace the traditional database. But given the install based of traditional databases and differing analytic requirements a complete migration will not happen. In the multi-storage environment data, within the database, can reside on one or more storage media and locations. These can include in-memory, flash, solid state, disk and also on Hodoop. Based on the information lifecycle management approach for defining where data will reside, a new framework is needed to dynamically management to movement of the data based on the frequency of usage. The project will look at how the data can be dynamically and efficiently migrated between storage media with the minimum of downtime.|
|ASD||Automation of VM builds and migrations using vagrant, ansible, docker and virtual machines, And how to autmate the migration of these to different Cloud vendors||Automation of VM builds and migrations using vagrant, ansible, docker and virtual machines, And how to automate the migration of these to different Cloud vendors|
|DA||Evaluation of AutoML features across languages and tools||The use of Automated Machine Learning (AutoML) is going to replace the data scientist! Or so they say. This project will evaluate the various AutoML solutions proposed by various vendors and languages to measure how good they really are, and how likely will companies and data scientists trust the use of them.|
|DA & ASD||Building a repository for continuously evolving self monitoring predictive analytics models||his project will focus at building the process to manage an autonomous building and rebuild of predictive models within an adaptive intelligence project. You will be working with many predictive and machine learning algorithms, building automated tools for the selection of the appropriate algorithms dependent on the underlying data sets. This project will integrate in with the other Adaptive Intelligence projects with the aim at developing an integrated solution that can easily be deployed in any environment.|
|ASD||Is it really possible to build a Big Data cluster using Raspberry Pis||In the era of big data and IoTs the cost associated with building a clustered environment can be huge. This project will look at building a Hadoop 5 node cluster using Raspberry Pi and evaluate how effective it is capturing data are different delivery rates. The data can be sent for storage on the cluster using Kafka. The second part will examine the efficiency data analysis using this cluster. [The student will have to purchase the required equipment]|
|ASD||Evaluation of Json and complex objects in Oracle, ProgreSQL, SQL Server and DB2||Most databases allow the creation of Json objects within the database and the embed these into traditional database tables. This project will examine how the main database vendors have implemented these features and assess their capabilities and ability to scale. Additional most databases allow the creation of nested and other complex data structures. A similar evaluation will be performed on these|
|ASD & DA||Expanding the analytical capabilities of the Database using the embedded JVM||Most enterprise level databases come with an inbuilt JVM. This allows you to create new functions within the database using Java. This project will take a number of machine learning, and various analytical functions and write these in optimized Java code, store these in the database and then evaluate the performance of these features against the existing equivalent functions in the database|
|ASD||Evaluation of live application upgrades with zero down time||Evaluation of solutions by leading vendors of live application and database upgrades with zero downtime. For example Oracle for a tool called Edition Based Redefinition. Other vendors have similar products. This project will review these tools and will provide an evaluation and benchmark of their use.|
|ASD & DA||Evaluation of Big Data Machine Learning Languages||The Apache foundation have a number of machine learning projects. Some of these have a SQL interface. These include HiveMall, MADlib, Storm and others. This project will evaluate the machine learning capabilities of these languages, providing number of worked scenarios. These scenarios will be benchmarked against each other|
|ASD||Security issues of bi-directional cloud portability for applications||Many frameworks exist to help developers build applications for cloud native architectures, both those in the cloud as well as those behind the firewall. Applications are becoming more complex with many components sitting in serverless and container environments hosted in the Cloud and behind a corporate firewall. This project will examine the implications of such frameworks and technical architectures and present a number of alternative solutions|