Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building a training set of tags for r #297

Closed
iHiD opened this issue Nov 1, 2023 · 22 comments
Closed

Building a training set of tags for r #297

iHiD opened this issue Nov 1, 2023 · 22 comments

Comments

@iHiD
Copy link
Member

iHiD commented Nov 1, 2023

Hello lovely maintainers 👋

We've recently added "tags" to student's solutions. These express the constructs, paradigms and techniques that a solution uses. We are going to be using these tags for lots of things including filtering, pointing a student to alternative approaches, and much more.

In order to do this, we've built out a full AST-based tagger in C#, which has allowed us to do things like detect recursion or bit shifting. We've set things up so other tracks can do the same for their languages, but its a lot of work, and we've determined that actually it may be unnecessary. Instead we think that we can use machine learning to achieve tagging with good enough results. We've fine-tuned a model that can determine the correct tags for C# from the examples with a high success rate. It's also doing reasonably well in an untrained state for other languages. We think that with only a few examples per language, we can potentially get some quite good results, and that we can then refine things further as we go.

I released a new video on the Insiders page that talks through this in more detail.

We're going to be adding a fully-fledged UI in the coming weeks that allow maintainers and mentors to tag solutions and create training sets for the neural networks, but to start with, we're hoping you would be willing to manually tag 20 solutions for this track. In this post we'll add 20 comments, each with a student's solution, and the tags our model has generated. Your mission (should you choose to accept it) is to edit the tags on each issue, removing any incorrect ones, and add any that are missing. In order to build one model that performs well across languages, it's best if you stick as closely as possible to the C# tags as you can. Those are listed here. If you want to add extra tags, that's totally fine, but please don't arbitrarily reword existing tags, even if you don't like what Erik's chosen, as it'll just make it less likely that your language gets the correct tags assigned by the neural network.


To summarise - there are two paths forward for this issue:

  1. You're up for helping: Add a comment saying you're up for helping. Update the tags some time in the next few days. Add a comment when you're done. We'll then add them to our training set and move forward.
  2. You not up for helping: No problem! Just please add a comment letting us know :)

If you tell us you're not able/wanting to help or there's no comment added, we'll automatically crowd-source this in a week or so.

Finally, if you have questions or want to discuss things, it would be best done on the forum, so the knowledge can be shared across all maintainers in all tracks.

Thanks for your help! 💙


Note: Meta discussion on the forum

@iHiD
Copy link
Member Author

iHiD commented Nov 1, 2023

Exercise: hamming

Code

# This is a stub function to take two strings
# and calculate the hamming distance
hamming <- function(strand1, strand2) {
  
  if(nchar(strand1) == nchar(strand2)) {
  
    count <- 0
    s1 <- strsplit(strand1, "")[[1]]
    s2 <- strsplit(strand2, "")[[1]]
    for(i in seq(length(s1))){
      if(s1[i] != s2[i]){
        count <- count + 1
      }
    }
    return(count)
  
  } else {
      return(NULL)
    }
}

Tags:

construct:add
construct:assignment
construct:comment
construct:for-loop
construct:if
construct:indexing
construct:int
construct:integral-number
construct:invocation
construct:logical-and
construct:null
construct:nullability
construct:number
construct:parameter
construct:return
construct:string
construct:variable
paradigm:imperative
paradigm:functional
technique:boolean-logic
technique:looping

@iHiD
Copy link
Member Author

iHiD commented Nov 1, 2023

Exercise: hamming

Code

# This is a stub function to take two strings
# and calculate the hamming distance
hamming <- function(strand1, strand2) {
  size <- length(strand1)
  hamm <- 0
  for (i in 0:size) {
    if (strand1[i] != strand2[i]) {
      hamm <- hamm + 1
    }
  }
  return(hamm)
}

Tags:

construct:assignment
construct:comment
construct:for-loop
construct:if
construct:indexing
construct:int
construct:integral-number
construct:invocation
construct:length
construct:number
construct:parameter
construct:return
construct:string
construct:variable
paradigm:imperative
paradigm:functional
technique:looping

@iHiD
Copy link
Member Author

iHiD commented Nov 1, 2023

Exercise: raindrops

Code

raindrops <- function(x = 0) {
  # Check if is integer
  if(is.integer(x)){
    # Extract factors
  x <- as.integer(x)
  div <- seq_len(abs(x))
  factors <- div[x %% div == 0L]
  } 
  if (any(factors %in% c(3, 5, 7))) {
    # Substitute numbers for words
    raindrop <- ifelse(factors == 3, "Pling",
                       ifelse(factors == 5, "Plang",
                              ifelse(factors == 7, "Plong", NA)))
    raindrop <- raindrop[!is.na(raindrop)]
    raindrop <- paste(raindrop, collapse = "")
    
    return(raindrop)
    # if no keyword, return original value
  } else {return(x)}
  
}

Tags:

construct:assignment
construct:boolean
construct:comment
construct:if-else
construct:implicit-conversion
construct:indexing
construct:integer
construct:logical-operator
construct:number
construct:optional-parameter
construct:parameter
construct:return
construct:vector
paradigm:imperative
paradigm:functional
technique:boolean-logic

@iHiD
Copy link
Member Author

iHiD commented Nov 1, 2023

Exercise: grains

Code

square <- function(n) {
  sqr <- vector("numeric", length = n)
  sqr[1] <- 1
  for(i in seq(2, n)){
    sqr[i] <- sqr[i-1]*2
  }
  
  sqr[n]
  
}


total <- function() {
  sqr <- vector("numeric", length = 64)
  sqr[1] <- 1
  for(i in seq(2, 64)){
    sqr[i] <- sqr[i-1]*2
  }
  sum(sqr)
  
}

Tags:

construct:assignment
construct:for-loop
construct:indexing
construct:invocation
construct:number
construct:parameter
construct:return
construct:subtract
construct:variable
construct:vector
paradigm:functional
paradigm:imperative
technique:looping

@iHiD
Copy link
Member Author

iHiD commented Nov 1, 2023

Exercise: anagram

Code

anagram <- function(subject, candidates) {

}

Tags:

construct:parameter
paradigm:functional

@iHiD
Copy link
Member Author

iHiD commented Nov 1, 2023

Exercise: space-age

Code

space_age <- function(seconds, planet) {
	earth_orbit <- 365.25


   
      orbital_data <- list(mercury = earth_orbit * 0.2408467, 
        earth = earth_orbit,  
      	venus = earth_orbit *  0.61519726 ,   
      	mars = earth_orbit * 8808158 ,   
      	jupiter= earth_orbit *  11.862615 ,
   		saturn = earth_orbit * 29.447498 , 
    	uranus = earth_orbit * 84.016846 ,
    	neptune = earth_orbit * 164.79132)

     ans <- seconds/(orbital_data[[planet]]*24*3600)
     return(round(ans,2))
}

Tags:

construct:assignment
construct:divide
construct:double
construct:floating-point-number
construct:implicit-conversion
construct:indexing
construct:invocation
construct:list
construct:multiply
construct:number
construct:parameter
construct:return
construct:variable
paradigm:imperative
paradigm:functional

@iHiD
Copy link
Member Author

iHiD commented Nov 1, 2023

Exercise: sum-of-multiples

Code

sum_of_multiples <- function(factors, limit) {
  
}

Tags:

construct:parameter
paradigm:functional

@iHiD
Copy link
Member Author

iHiD commented Nov 1, 2023

Exercise: word-count

Code

word_count <- function(words) {
	as.list(table(strsplit(words, " ")[[1]]))
}

Tags:

construct:list
construct:string
construct:explicit-conversion
paradigm:functional

@iHiD
Copy link
Member Author

iHiD commented Nov 1, 2023

Exercise: phone-number

Code

parse_phone_number <- function(number_string) {
  digits <- as.vector(str_match_all(number_string, "\\d")[[1]])
  n <- length(digits)
  if (n < 10 | n > 11) {
    NULL
  }
  else if (n == 11 && digits[[1]] != 1) { # area code != 1
    NULL
  }
  else {
    area_code <- digits[[n-9]]
    exchange_code <- digits[[n-6]]
    if (area_code < 2 | exchange_code < 2) {
      NULL
    }
    else {
      paste(digits[(n-10+1):n], sep="", collapse="")
    }
  }
}

Tags:

construct:add
construct:assignment
construct:boolean
construct:comment
construct:if
construct:integer
construct:logical-and
construct:logical-or
construct:named-argument
construct:number
construct:parameter
construct:string
construct:subtract
construct:vector
paradigm:imperative
paradigm:functional
technique:boolean-logic
technique:regular-expression
uses:stringr

@iHiD
Copy link
Member Author

iHiD commented Nov 1, 2023

Exercise: isogram

Code

is_isogram <- function(word) {

}

Tags:

No tags generated

@iHiD
Copy link
Member Author

iHiD commented Nov 1, 2023

Exercise: beer-song

Code

lyrics <- function(first, last) {
  paste(sapply(seq(first, last, by = -1), verse), collapse = "\n")
}

verse <- function(number) {

  if (number > 2) {
    text <- c(paste0(number, " bottles of beer on the wall, ", number, " bottles of beer."),
              paste0("Take one down and pass it around, ", number - 1, " bottles of beer on the wall.\n"))
  } else if (number == 2) {
      text <- c(paste0(number, " bottles of beer on the wall, ", number, " bottles of beer."),
                paste0("Take one down and pass it around, ", number - 1, " bottle of beer on the wall.\n"))
  } else if (number == 1) {
    text <- c(paste0(number, " bottle of beer on the wall, ", number, " bottle of beer."),
              paste0("Take it down and pass it around, no more bottles of beer on the wall.\n"))
  } else {
    text <- c("No more bottles of beer on the wall, no more bottles of beer.",
              "Go to the store and buy some more, 99 bottles of beer on the wall.\n")
  }
  
  return(paste(text, collapse = "\n"))
  
}

Tags:

construct:string
construct:assignment
construct:by-convention
construct:c
construct:char
construct:comment
construct:for-loop
construct:if
construct:implicit-loop
construct:indexing
construct:integer
construct:invocation
construct:lambda
construct:named-argument
construct:number
construct:parameter
construct:return
construct:string
construct:subtract
construct:variable
paradigm:functional
paradigm:imperative
paradigm:object-oriented
technique:higher-order-functions
technique:looping

@iHiD
Copy link
Member Author

iHiD commented Nov 1, 2023

Exercise: perfect-numbers

Code

library(magrittr)

is_perfect <- function(n){
  
  # catch edge cases
  if (n <= 0)
    stop("Only natural number can be classified here!")
  if (n <= 2)
    return("deficient")
  
  # find n's factors, incl. 1 but not n (aliquots)
  factor <- function(i)
    if (n %% i == 0)
      i
  
  # calculate sum and classify n
  lapply(1:(n/2), factor) %>% 
    unlist %>% 
    sum ->
    sum
    
  dplyr::case_when(
    sum == n ~ "perfect",
    sum < n ~ "deficient",
    sum > n ~ "abundant"
  )
}

Tags:

construct:assignment
construct:back-tick
construct:comment
construct:divide
construct:double
construct:equals
construct:floating-point-number
construct:if
construct:implicit-conversion
construct:integer
construct:integral-number
construct:invocation
construct:lambda
construct:library
construct:method
construct:number
construct:parameter
construct:return
construct:string
construct:subtract
construct:variable
paradigm:functional
paradigm:imperative
paradigm:object-oriented
technique:exceptions
technique:higher-order-functions

@iHiD
Copy link
Member Author

iHiD commented Nov 1, 2023

Exercise: prime-factors

Code

prime_factors <- function(number) {
  out <- c()
  if(number > 1) {
  for(n in c(2:number)){
    while((number / n) %% 1 == 0) {
      out <- c(out, n)
      number <- number /n}
    if(n == number) {break}
  }
  out
  } else {out}
}

Tags:

construct:assignment
construct:break
construct:divide
construct:double
construct:for-loop
construct:if
construct:implicit-conversion
construct:invocation
construct:logical-and
construct:number
construct:parameter
construct:return
construct:variable
construct:while-loop
paradigm:imperative
paradigm:functional
paradigm:object-oriented
technique:boolean-logic
technique:looping

@iHiD
Copy link
Member Author

iHiD commented Nov 1, 2023

Exercise: largest-series-product

Code

largest_series_product <- function(digits, span){

}

Tags:

No tags generated

@iHiD
Copy link
Member Author

iHiD commented Nov 1, 2023

Exercise: pascals-triangle

Code

pascals_triangle <- function(n) {

}

Tags:

construct:parameter

@iHiD
Copy link
Member Author

iHiD commented Nov 1, 2023

Exercise: pascals-triangle

Code

pascalsTriangle <- function(n) {
  
  if (n == 0) {
    return(list())
  }
  else if (n == 1) {
    return(list(1))
  }
  else if (n == 2) {
    return(list(1, c(1,1)))
  }
  else if (n >= 3) {
    triangle <- list(1, c(1,1))
    for (x in 3:n) {
      row <- rep(1, x)
      for (i in 2:(x - 1)) {
        row[i] = sum(triangle[[x - 1]][(i - 1):i])
      }
      triangle[[x]] = row
    }
    return(triangle)
  }
  else {
    stop("argument n needs to be an integer")
  }
  
}

Tags:

construct:assignment
construct:break
construct:double
construct:for-loop
construct:if
construct:implicit-conversion
construct:indexing
construct:integer
construct:list
construct:number
construct:parameter
construct:return
construct:subtract
construct:vector
paradigm:imperative
paradigm:functional
technique:boolean-logic
technique:looping

@iHiD
Copy link
Member Author

iHiD commented Nov 1, 2023

Exercise: nucleotide-count

Code

nucleotide_count <- function(input) {

}

Tags:

construct:parameter

@iHiD
Copy link
Member Author

iHiD commented Nov 1, 2023

Exercise: pangram

Code

is_pangram <- function(input) {

}

Tags:

construct:parameter

@jonmcalder
Copy link
Member

Tag editing underway...

@ErikSchierboom
Copy link
Member

This is an automated comment

Hello 👋 Next week we're going to start using the tagging work people are doing on these. If you've already completed the work, thank you! If you've not, but intend to this week, that's great! If you're not going to get round to doing it, and you've not yet posted a comment letting us know, could you please do so, so that we can find other people to do it. Thanks!

@jonmcalder
Copy link
Member

I did a run through and light touch edit on tags but they're likely not very comprehensive and I'm sure there are still inconsistencies so would appreciate it if anyone else is able to check and improve these tags.

A number of these were either test files or empty solution stubs, so I wasn't sure whether we want to tag those since they could be used but obviously aren't representative training data.

@exercism exercism deleted a comment from iHiD Nov 22, 2023
@exercism exercism deleted a comment from iHiD Nov 22, 2023
@ErikSchierboom
Copy link
Member

Thanks for the help! We've updated the tags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants