Research & Development
Our research & development in HLT
We combine research experts and the essential technical and administrative support in order to conduct cutting-edge research in text technology and use that as the basis for the development of innovative and relevant technological applications.
We establish ourselves as leaders in the field of Human Language Technology within South Africa and promote multilingualism and diversity within the digital environment.
Our Research
Publications
Research outputs in 2023
Deep learning and low-resource languages: How much data is enough? A case study of three linguistically distinct South African languages
Gaustad, T. & Eiselen, E.R. 2023. Data in Brief. April 2023
A dataset of self-reported attitudes to Afrikaans swearwords
Van Huyssteen, G.B., Eiselen, E.R., Du Toit, J.S. 2023. Journal of Open Humanties Data. 2023
Translation Technology in South Africa
Van Huyssteen, G.B., Puttkammer, M.J., McKellar, C.A., Griesel, M. 2023. In Routledge Encyclopaedia of Translation Technology, edited by S.W. Chan. Routledge, 373-383
Ouderdoms- en inhoudsadvies vir Afrikaanse boeke vir kinders: resultate van ’n eerste kwalitatiewe en kwantitatiewe ondersoek
Van Huyssteen, G.B., Rabé, M, and Puttkammer, M.J. 2023. LitNet Akademies (Geesteswetenskappe) 20(1):185–212
'n Empiriese vergelyking van die potensiële aanstootlikheid van enkele skelnaampare in Afrikaans [An empirical comparison of the potential offensiveness of some epithet pairs in Afrikaans]
Van Huyssteen, G.B., Koekemoer, S. 2023. Tydskrif vir Geesteswetenskappe 63:560–584
Investigating the extent and usability of webtext available in South Africa's official languages
De Wet, F., Eiselen, E.R., Schillack, E., Puttkammer, M.J. 2023. Artificial Intelligence Research. SACAIR 2023. Communications in Computer and Information Science. Springer
IsiXhosa Named Entity Recognition Resources
Eiselen, E.R. & Bukula, A. 2023. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 22:2. pp. 1-19
A methodology for the description of constructionalisation networks: Constructions with [in] as a case study
Van Huyssteen, G.B., Breed, A., Butler, A., Botha, L., Partridge, M., and Pilon, S. 2023. Stellenbosch Papers in Linguistics
A comparison of Statistical Tests for Likert-type data: The case of swearwords
Eiselen, E.R. and Van Huyssteen, G.B. 2023. Journal of Open Humanities Data
Research outputs in 2022
isiXhosa named entity recognition resources
Eiselen, E.R.E., & Bukula, A. 2022. IsiXhosa Named Entity Recognition Resources. ACM Trans. Asian Low-Resour. Lang. Inf. Process, 22:2 pp. 1-19
Linguistically annotated dataset for four official South African languages with a conjunctive orthography: IsiNdebele, isiXhosa, isiZulu, and Siswati >
Gaustad, T. & Puttkammer, M.J. 2022. Data in Brief. Volume 41, April 2022, 107994
Research outputs during 2021
Standaardisering as ’n produk van die tydsgees
Van Huyssteen, G.B & Pilon, S. 2021. Ontlaering – Geworteldheid: Die onderrig van Afrikaans in spesifieke ruimtes
When a word is befok
Van Huyssteen, G.B. 2021. Afrikaans Grammar Workshop III
How Afrikaans women became fierce-tempered
Van Huyssteen, G.B. & Eiselen, E.R. 2021. Zürich Workshop on Afrikaans Linguistics.
Swearing in South Africa: Multidisciplinary research on language taboos
Van Huyssteen, G.B. 2021. International Conference of the Digital Humanities Association of Southern Africa 2021
Using ordinal logistic regression to analyse self-reported usage of, and attitudes towards swearwords
Van Huyssteen, G.B. & Eiselen, E.R. 2021. International Conference of the Digital Humanities Association of Southern Africa 2022
Development of linguistically annotated parallel language resources for four South African languages.
Gaustad, T. & Puttkammer, M.J. 2021. 2nd Workshop on Resources for African Indigenous Languages (RAIL 2021), co-located with DHASA 2021.
Canonical Segmentation and Syntactic Morpheme Tagging of Four Resource-scarce Nguni Languages
Du Toit, J.S. & Puttkammer, M.J. 2021. 2nd Workshop on Resources for African Indigenous Languages (RAIL 2021), co-located with DHASA 2021
Oor feekse en helleveë [On shrews and harridans]
Van Huyssteen, G.B. & Eiselen, E.R.E. 2021. Tydskrif vir Geesteswetenskappe
Quantitative analysis of Sesotho sa Leboa part of speech taggers
Mathe, D.S. and Eiselen, E.R.E. 2021. South African Journal of African Languages
Content developers as stakeholders in the blended learning ecosystem: The Virtual Institute for Afrikaans’ Language Education Portal as a case study
Breed, A., Fouché, N., Brink, N., Coetzee, M., Erasmus, C., Kapp, S., Pilon, S., Huyssteen, G.B. and Wierenga, R. 2021. Re-Envisioning and Restructuring Blended Learning for Underprivileged Communities
Research outputs from 2016 to 2020
2020
NCHLT Web Services and CTexTools
Puttkammer, M. 2020. Tour de clarin, vol III
Viability of Neural Networks for Core Technologies for Resource-Scarce Languages
Loubser, M. & Puttkammer, M.J. 2020. Information 11(1), 41
Dataset for comparable evaluation of machine translation between 11 South African languages
McKellar, C.A. & Puttkammer, M.J. Data in Brief, Volume 29, 2020, 105146, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2020.105146.
Die /r/ in Afrikaans: Fonetiese en fonologiese eienskappe
Wissing, D.P. & Pienaar, W. 2020. Literator
Afrikaans
Wissing, D.P. 2020. Journal of the International Phonetic Association
2019
“Wat gaan word van geskrewe Standaardafrikaans? [What is going to happen to written Standard Afrikaans?]”
VAN HUYSSTEEN, G.B. 2019. In Van der Elst, J. (ed.). SA Akademie vir Wetenskap en Kuns: Verlede, hede toekoms (1909-2019). 86-89. ISBN: 978-0-949976-97-0. Pretoria: SAAWK.
Herbesoek aan Afrikaanse klemtoon: is dit (nog) ’n inisiëleklemtoontaal?
Wissing, D.P. LitNet Akademies, 16.2 (2019): 214-239.
Perspektief op/ɛ/-verlaging in Afrikaans.
Wissing, D.P. LitNet Akademies, 16.1 (2019): 166-206.
2018
The Hulle en Goed Constructions in Afrikaans.
VAN HUYSSTEEN, G.B. 2018.
Stabilising determinants in the transmission of phonotactic systems: Diachrony and acquistition of coda clusters in Dutch and Afrikaans
Wissing, D.P. 2018.
The Status of Tone in Sesotho: A Production and Perception Study.
Wissing, D.P. 2018.
Naar een Wikifonia.
VAN OOSTENDORP, M., VISSER, W. & WISSING, D. Nederlandse Taalkunde, 23.2 (2018): 141-150.
Die ontwikkeling van [ʃ] in Afrikaans
WISSING, D. 2018. Literator. 39. 10.4102/lit.v39i2.1486.
2017
Afrikaanse Woordelys en Spelreëls [Afrikaans Wordlist and Spelling Rules].
TAALKOMMISSIE VAN DIE SUID-AFRIKAANSE AKADEMIE VIR WETENSKAP EN KUNS (COMP). 2017. Eleventh edition. ISBN (printed): 978-1-86890-207-1; ISBN (online): 978-1-86890-208-8. Cape Town, Pharos, 775pp.
Voorwoord [Preface].
VAN HUYSSTEEN, G.B. 2017. In: Suid-Afrikaanse Akademie vir Wetenskap en Kuns. Afrikaanse woordelijs en spelreëls. Faksimilee-uitgawe [Afrikaans wordlist and spelling rules. Facsimile edition]. Pretoria: Protea Boekhuis.
Morfologie. [Morphology].
VAN HUYSSTEEN, G.B. 2017. In: Carstens, WAM & Bosman, N. (reds.). Kontemporêre Afrikaanse Taalkunde. [Contemporary Afrikaans Linguistics]. Second edition. ISBN 978-0-627-03437-4. Pretoria: Van Schaik Uitgewers. pp. 177-214.
Plosive voicing in Afrikaans: differential cue weighting and sound change.
WISSING, D.P. 2017. Journal of Linguistics
Elektroniese woordeboeke en die Afrikaanse gemeenskap [Electronic dictionaries and the Afrikaans community]
VAN HUYSSTEEN, G.B. & Luther, J. 2017. Gents colloquium over het Afrikaans [Ghent colloquium on Afrikaans], University of Ghent, Ghent, Belgium.
Constructionist perspectives on two competing associative plural constructions.
VAN HUYSSTEEN, G.B. 2017. 11th International Mediterranean Morphology Meeting, Nicosia, Cyprus.
2016
South African Language Resources: Phrase Chunking.
EISELEN, R. 2016. Tenth International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia. pp. 689-693
Government Domain Named Entity Recognition for South African Languages.
EISELEN, R. 2016. Tenth International Conference on Language Resources and Evaluation, LREC 2016, Portorož, Slovenia. pp. 3344-3348.
Optical character recognition for South African Languages.
PUTTKAMMER, M.J. & HOCKING, J. 2016. PRASA 2016.
The effect of respondents' skill levels in collaborative data annotation.
PUTTKAMMER, M.J. & VAN HUYSSTEEN, G.B. 2016. "Under-resourced Languages, Collaborative Approaches and Linked Open Data : Resources, Methods and Applications". Springer. Language Resources and Evaluation Journal Special Issue.
AfriBooms: An Online Treebank for Afrikaans.
VAN HUYSSTEEN, G.B., AUGUSTTINUS, L., VAN EYNDE, F., VAN NIEKERK, D., SCHUURMAN, I. & VANDEGHINSTE, V. 2016. LREC 2016.
A stepwise methodology for establishing natural language processing evaluation reliability.
EISELEN, R. & VAN HUYSSTEEN, G.B. 2016. Language Resource and Evaluation.
What French for Gabonese French Lexicography.
NDINGA-KOUMBA, S., ASSAM, B.N. & OMPOUSSA, V. 2016. Lexikos. 26(2016): 1-31
Die Virtuele Instituut vir Afrikaans (VivA) en markbehoeftes in die Afrikaanse gemeenskap.
VAN HUYSSTEEN, G. B., BOTHA, M. & ANTONITES, A. 2016. Tydskrif vir Geesteswetenskappe. 56(2-1): 410-437.
Research outputs from 2011 to 2015
2015
Afrikaans and Dutch as closely-related languages: A comparison to West Germanic languages and Dutch dialects.
HEERINGA, W., DE WET, F. & VAN HUYSSTEEN, G.B. 2015. Stellenbosch Papers in Linguistics Plus. 47(2015): 1-18.
Planning and Macrostructural Elements for a Multilingual Culinary Dictionary of Gabonese Languages.
OMPOUSSA, V. & NDINGA_KOUMBA-BINZA, S. 2015. Lexikos. 25 (2015): 507-524.
Translation Technology in South Africa.
VAN HUYSSTEEN, G.B. & GRIESEL, M. 2015. In: Chan, S-W. (ed.). Routledge Encyclopedia of Translation Technology. ISBN: 978-0-415-52484-1. New York: Routledge. 326-336pp.
Aan die en besig in Afrikaanse progressiwiteitskonstruksies : 'n korpusondersoek (2) : navorsings- en oorsigartikel.
VAN HYUSSTEEN. G.B. & BREED. A. 2015. Tydskrif vir Geesteswetenskappe. 55(2):251-269.
Palatalisation of /s/ in Afrikaans.
WISSING, D.P., PIENAAR W. & VAN NIEKERK, D. 2015. Spilplus. 48(2015): 137-158.
Bilingual speech rhythm: Spanish-Afrikaans in Patagonia.
COETZEE, A.W., LORENZO., G.A., HENRIKSEN, A. & WISSING. D.P. 2015. In The Scottish Consortium for ICPhS 2015, eds. Proceedings of the 18th International Congress of Phonetic Sciences. London: International Phonetic Association: London.
HLT and the changing face of translation - a CTexT perspective.
FOURIE. W. 2015. Boers, M. ed. Proceedings of the South African Translators' Institute's Second Triennial Conference. Johannesburg: SATI. p. 18-20). ISBN: 978-0-620-68208-4
2014
Afrikaans and Dutch as closely-related languages: A comparison to West Germanic languages and Dutch dialects.
HEERINGA, W., DE WET, F. & VAN HUYSSTEEN, G.B. 2015. Stellenbosch Papers in Linguistics Plus. 47(2015): 1-18.
Planning and Macrostructural Elements for a Multilingual Culinary Dictionary of Gabonese Languages.
OMPOUSSA, V. & NDINGA_KOUMBA-BINZA, S. 2015. Lexikos. 25 (2015): 507-524.
Translation Technology in South Africa.
VAN HUYSSTEEN, G.B. & GRIESEL, M. 2015. In: Chan, S-W. (ed.). Routledge Encyclopedia of Translation Technology. ISBN: 978-0-415-52484-1. New York: Routledge. 326-336pp.
Aan die en besig in Afrikaanse progressiwiteitskonstruksies : 'n korpusondersoek (2) : navorsings- en oorsigartikel.
VAN HYUSSTEEN. G.B. & BREED. A. 2015. Tydskrif vir Geesteswetenskappe. 55(2):251-269.
Palatalisation of /s/ in Afrikaans.
WISSING, D.P., PIENAAR W. & VAN NIEKERK, D. 2015. Spilplus. 48(2015): 137-158.
Bilingual speech rhythm: Spanish-Afrikaans in Patagonia.
COETZEE, A.W., LORENZO., G.A., HENRIKSEN, A. & WISSING. D.P. 2015. In The Scottish Consortium for ICPhS 2015, eds. Proceedings of the 18th International Congress of Phonetic Sciences. London: International Phonetic Association: London.
HLT and the changing face of translation - a CTexT perspective.
Fourie, W. 2015. Boers, M. ed. Proceedings of the South African Translators' Institute's Second Triennial Conference. Johannesburg: SATI. p. 18-20). ISBN: 978-0-620-68208-4
Research outputs from 2007 to 2010
2007
Accelerating the Annotation of Lexical Data for Less-Resourced Languages.
Van Huyssteen, G.B. & Puttkammer, M.J. 2007. Accelerating the Annotation of Lexical Data for Less-Resourced Languages. (In Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech 2007). p. 1505-1508.)
Evaluating Wrapped Progressive Sampling for Automatic Algorithmic Parameter Optimisation.
GROENEWALD, H.J., VAN HUYSSTEEN, G.B. & PUTTKAMMER, M.J. 2007. (In Angelova, G., Bontcheva, K., Mitkov, R., & Nikolov, N., eds. Proceedings of Recent Advances in Natural Language Processing 2007, Borovets, Bulgaria. p. 251-255.)
Using Machine Learning to Annotate Data for NLP Tasks Semi-Automatically
Van Huyssteen, G.B., Puttkammer, M.J., Pilon, S., & Groenewald, H.J. 2007. (In Orasan, C. & Kuebler, S., eds. Proceedings of International Workshop on Computer-Aided Language Processing, Borovets, Bulgaria.)
Accelerating the Annotation of Lexical Data for Less-Resourced Languages
Van Huyssteen, G.B. & Puttkammer, M.J. 2007. Presentation delivered at the 8th Annual Conference of the International Speech Communication Association (Interspeech 2007), Antwerp
Datagebaseerde Aspekte van Afrikaanse Reduplikasies
Van Huyssteen, G.B. & Wissing, D.P. 2007. Southern African Linguistics and Applied Language Studies, 25(3): 419-439
Global and local durational properties in three varieties of South African English
Coetzee, A.W. & Wissing, D.P. 2007. Linguistic Review, 24:263-289
Gevorderde akoestiese korrelate van Afrikaanse klemtoon
Wissing, D.P. 2007. Southern African Linguistics and Applied Language Studies, 25
Basiese akoestiese korrelate van klemtoon in Afrikaans
Wissing, D.P. 2007. Southern African Linguistics and applied Language Studies, 25:441-458
Testing the use of Lessac's Tonal NRG as a voice building tool for female students at a South African University
MUNRO, M & Wissing, D.P. 2007. Voice and Speech Review
Automatic Parameter Selection for Effective Afrikaans Lemmatisation.
Groenewald, H.J., Van Huyssteen, G.B. & Puttkammer, M.J. 2007. Presentation delivered at the Recent Advances in Natural Language Processing (RANLP) 2007, Borovets, Bulgaria.
Heroorweging van Fleksie in Afrikaans
Van Huyssteen, G.B. & Groenewald, H.J. 2007. Voordrag gelewer by LVSA/SAALA/SAVTO 2007, NWU Potchefstroomkampus
Requirements for Machine-Aided translation Tools
Van Huyssteen, G.B. & Groenewald, H.J. 2007. Presentation delivered at LVSA/SAALA/SAVTO 2007, NWU, Potchefstroom campus
Feature Selection and Parameter Optimisation for Effective Afrikaans Lemmatisation
Van Huyssteen, G.B. & Groenewald, H.J. 2007. Presentation delivered at the International 17th Meeting of Computational Linguistics in the Netherlands (CLIN) 2007, University of Leuven, Leuven.
ʼn Fleksievormgenereerder
Pilon, S. 2007. Voordrag gelewer by LVSA/SAALA/SAVTO 2007, NWU, Potchefstroomkampus
MT for English-isiZulu/Afrikaans
Pilon, S. & Pienaar, J.A. 2007. Presentation delivered at LVSA/SAALA/SAVTO 2007, NWU, Potchefstroomcampus
Lexicon Creation and Management: TurboAnnotate
Van Huyssteen, G.B. & Puttkammer, M.J. 2007. Presentation delivered at LVSA/SAALA/SAVTO 2007, NWU, Potchefstroom campus
Developing Web-Based Word-Translators
Van Huyssteen, G.B., Puttkammer, M.J. & Schlemmer, M. 2007. Presentation delivered at LVSA/SAALA/SAVTO 2007, NWU, Potchefstroom campus
Nadruk in Afrikaans: akoestiese kenmerke en metodologiese oorwegings by die vasstellings daarvan
Wissing, D.P. 2007. Voordrag gelewer by LVSA/SAALA/SAVTO 2007, NWU, Potchefstroomkampus
More on acoustic correlates of stress.
Wissing, D.P. 2007. Presentation delivered at the 8th Annual Conference of the International Speech Communication Association (Interspeech 2007), Antwerp.
Development
All the latest on our Projects
Have a look at the recent projects we've done for our various clients. The development of the resources and or software is described with the output available for downloading.
PROJECTS
Autshumato |
Autshumato encompasses a series of projects that develop machine translation systems for South African languages. Here you'll find the work we've done for the various tools offered within Autshumato.
|
|
PROJECTS
SADiLaR |
We are the official text node for SADiLaR with a focus on the advancement of multilingualism. We develop text resources for our under-resource languages which is crucial for being able to develop within big data and artificial intelligence within the SA context. Here we develop linguistically enriched corpora, core technologies and proofing tools. |
PROJECTS
VivA Afrikaans |
We've been collaborating with the Virtuele instituut vir Afrikaans (VivA) by maintaining the content and technical services of the Corpus and Dictionary Portals. With over 85 million words in the Corpus Portal and over 50 dictionaries and word lists in the Dictionary Portal, we make sure that the systems are up to date with the latest etymology, spelling, and meanings.
|
We ensure long-term sustainability for research and development activities. This establishes valuable partnerships with academic and industry partners with in an interest in natural language processing and computational linguistics.
Software & Resources
Have a look at our applied technologies.
Corpora
Our compilation of collections of texts with a focus on resource-scarce languages of South Africa for further research and development.
Core Technologies
Morphological analysers, Part-of-Speech (POS) taggers and Lemmatisers are the core technologies we develop resources for.
Translation Aids
Have a look at our work within machine translation and other tools within our Autshumato projects and our Spelling Checkers.