BigGorilla is an open-source data integration and data preparation ecosystem (powered by Python) to enable data scientists to perform integration and analysis of data. BigGorilla consolidates and documents the different steps that are typically taken by data scientists to bring data from different sources into a single database to perform data analysis. For each of these steps, we document existing technologies and also point to desired technologies that could be developed.
The different components of BigGorilla are freely available for download and use. Data scientists are encouraged to contribute code, datasets, or examples to BigGorilla. We hope to promote education and training for aspiring data scientists with the development, documentation, and tools provided through BigGorilla.
We make many decisions on a daily basis. However, it is easy to be sidetracked by urgent needs and short term goals, but fail to attend to activities that contribute to our long-term well-being and happiness. At RIT, one of our main research projects asks the basic question: can we develop technology that steers people toward behaviors that make them happier?
Our work is inspired by psychology research, especially a field known as Positive Psychology. We are developing "Jo" - an agent that helps you record your daily activities, generalizes from them, and helps you create plans that increase your happiness. Naturally, this is no easy feat. Jo raises many exciting technical challenges for NLP, chatbot construction, and interface design: how can we build an interface that's useful but not intrusive.
We are also working on creating a research platform that helps psychology researchers take advantage of the advancements in large-scale data collection and natural language processing. We hope the data science techniques Jo develops can be used to drive the state-of-the-art in psychology research.
Natural Language Processing
We are working on advancing the state of the art of a set of technologies such as Named Entity Extraction, Information Extraction, Semantic Parsing, Synonym mining, and Conversational interfaces. The goal of these technologies is to improve the quality of the services of Recruit Holdings. For example, high quality extraction of job skills from resumes and job descriptions can improve quality of matching resumes to job descriptions. Our goal at RIT is to make these techniques available and easily usable in production by engineers of 120 companies of Recruit Holdings, and ultimately available as open-source software.
Usagi is a data discovery system for Recruit’s internal infrastructure. Usagi crawls metadata within Recruit's web services every day and builds a catalog of datasets. Usagi enables users to search, monitor, and annotate the metadata which helps them discover the appropriate data to perform analysis.
Many large enterprises today witness an explosion in the number of datasets. Therefore, we aim to make Usagi open source. We hope Usagi helps data-users in these enterprises discover meaningful data easily.