Corpus Linguistics

FAQs about using SketchEngine @ University of Portsmouth

  • Who can access it?
    Anyone at the University of Portsmouth. We have an institutional-wide licence which works on IP recognition
  • Can I access it from home?
    Yes, but you will have to set up the proxy so it can recognise that you are from the University of Portsmouth. See the proxy help page for more information
  • What languages are included?
    Arabic, Bengali, Bulgarian, Chinese, Croatian, Czech, Dutch, English, Estonian, Finnish, French, German, Greek, Hebrew, Hindi, Igbo, Indonesian, Italian, Latin, Malay, Persian, Polish, Portugese, Romanian, Russian, Serbian, Setswana, Slovenian, Spanish, Swahili, Swedish, Telugu, Thai, Turkish, Vietnamese, Welsh.
  • What kinds of texts are included?
    They are mainly very large collections of internet-based texts, but for some languages there are also parallel corpora from the European Parliament, and for English there are general corpora and corpora of academic English, newspaper discourse, and child language.

These activities were originally developed for a workshop by Charlotte Taylor as part of the Languages across Borders series.

The idea behind the tasksheets is that you should be able to explore whatever you want. So, just have a look at the ‘I want’s below and click on the one that you would like to do.
Each of the short tasks could be adapted to another query and another corpus, they are just there to help you get to know Sketch. Once you have done a few, you may want to develop your own questions –play around!

Tasks for getting started

  1. I want to know how to see a word in its context
  2. I want to know how to see a list of word forms for a particular lemma
  3. I want to know how to find the differences between two near synonyms
  4. I want to know how to find out how a particular word behaves, for example whether it usually collocates with positive or negative things
  5. I want to know how to find out which verb is usually used with a particular noun
  6. I want to know how to find out which adverbs are usually used with a particular adjective
  7. I want to know how to compare two (potential) translation equivalents
  8. I want to know how to find out how other people have translated a particular word
  9. I want to know how to find out which adjectives are used most frequently in a particular discourse type (e.g. Academic Spoken English)
  10. I want to know how to look for patterns (e.g. what else can be in used in the form a couple of sandwiches short of a picnic)
  11. I want to know how to find words which occur in similar lexical patterns to my search word
  12. How can I find which words are typical of one corpus compared to another? (e.g. spoken language 'v' written language)

User queries

These are not 'starting points' but answers to questions which are likely to come up.

  1. How can I re-order the concordance lines?
  2. How can I see more context for my concordance lines?
  3. How can I search for more than one thing at once?
  4. How can I specify words that I don’t want in the co-text?
  5. How can I use wildcards in searches?
  6. How can I use part-of-speech tags in searches?
  7. How many different ways are there for investigating collocation?

For more information, you could also look at the Sketch Engine help pages and the lessons on using Sketch Engine with the BAWE corpus, put together by Hilary Nesi and (Coventry University, UK) and Paul Thompson (University of Reading, UK).