Bir önek için metin arayın ve metindeki tüm sonekleri listeleyin


17

Burada "öneki takip eden herhangi bir alt dize" demek için "sonek" i gevşek kullanıyorum.

Buradaki "Önek", bir kelimenin başlangıcını boşluktan sonra veya giriş metninin ilk karakterinden (ilk kelime için) tanımlandığı bir sözcüğün BAŞLANGISI anlamına gelir. Bir kelimenin ortasındaki "önek" yok sayılır.

Örneğin, girdi önekiniz "arm" ise ve girdi metni "Dumbledore'un ordusu yaklaşmakta olan armageddon için tamamen silahlandırıldıysa" çıktı listesinde (y, ed, agedon) bulunur.

Test Durumları

Büyük / küçük harfe duyarlı, dizelerin boşluklardan sonra bittiğini varsayın. Giriş boşlukla başlamaz.

Kopyaları kaldırmak isteğe bağlıdır.


Input prefix: "1"

Input text:

"He1in aosl 1ll j21j 1lj2j 1lj2 1ll l1j2i"

Output: (ll, lj2j, lj2) - in any permutation

Input prefix: "frac"

Input text: 

"fracking fractals fracted fractional currency fractionally fractioned into fractious fractostratic fractures causing quite a fracas"

Output: (king, tals, ted, tional, tionally, tioned, tious, tostratic, tures, as)

Input prefix: "href="https://www.astrotheme.com/astrology/"

Input text: 

"(div style="padding: 0; background: url('https://www.astrotheme.com/images/site/arrondi_450_hd.png') no-repeat; text-align: left; font-weight: bold; width: 450px; height: 36px")
  (div class="titreFiche" style="padding: 5px 0 0 6px")(a href="https://www.astrotheme.com/astrology/Nolwenn_Leroy" title="Nolwenn Leroy: Astrology, birth chart, horoscope and astrological portrait")Nolwenn Leroy(br /)
(/div)
  (div style="text-align: right; border-left: 1px solid #b2c1e2; border-right: 1px solid #b2c1e2; width: 446px; padding: 1px 1px 0; background: #eff8ff")
    (table style="width: 100%")(tr)(td style="width: 220px")
(div style="padding: 0; background: url('https://www.astrotheme.com/images/site/arrondi_450_hd.png') no-repeat; text-align: left; font-weight: bold; width: 450px; height: 36px")
  (div class="titreFiche" style="padding: 5px 0 0 6px")(a href="https://www.astrotheme.com/astrology/Kim_Kardashian" title="Kim Kardashian: Astrology, birth chart, horoscope and astrological portrait")Kim Kardashian(br /)(span style="font-weight: normal; font-size: 11px")Display her detailed horoscope and birth chart(/span)(/a)(/div)
(/div)
(div style="padding: 0; background: url('https://www.astrotheme.com/images/site/arrondi_450_hd.png') no-repeat; text-align: left; font-weight: bold; width: 450px; height: 36px")
  (div class="titreFiche" style="padding: 5px 0 0 6px")(a href="https://www.astrotheme.com/astrology/Julia_Roberts" title="Julia Roberts: Astrology, birth chart, horoscope and astrological portrait")Julia Roberts(br /)(span style="font-weight: normal; font-size: 11px")Display her detailed horoscope and birth chart(/span)(/a)(/div)
    (td id="cfcXkw9aycuj35h" style="text-align: right")
  (/div)"

Output: (Nolwenn_Leroy", Kim_Kardashian", Julia_Roberts")

Kazanan

Bu , bu yüzden en az bayt kazanır. :)

Kodunuz test senaryoları gibi keyfi sorunları çözebildiği sürece girişleri çalışan herhangi bir şekilde kabul edebilir.


2
Açık olmak gerekirse, önek bir kelimenin başında mı olmalıdır? İkinci test vakasında 'kırınım' kelimesi varsa, bu çıktıyı değiştirir mi?
sundar - Monica'yı geri

2
https://www.astrotheme.com/astrology/Öncesinde bir önek nasıl olabilir href="?
Neil

1
Sonek boş olabilir mi?
user202729

1
İnsanların diğer beyaz boşluklara ve boşluklara bölünmesine izin vermenizi öneririm, çünkü birkaç kişi böyle yapıyor. Ayrıca girişte bir satırda birden fazla boşluk olmayacağını söyleyebilirim (ya da eşdeğer olarak boş kelimelerin tanımsız davranışa neden olabileceğini). Her iki şeyi de öneriyorum çünkü meydan okumanın ana kısmı kelimeler kısmına bölünmüyor (sadece bir kelime listesine hatta giriş için sadece bir kelimeye izin vermenizi öneririm, ama şimdi 22 cevap ile çok geç - not edilecek bir şey gelecekteki zorluklar için).
Jonathan Allan

1
-1 şimdi diğer boşluklarda bölünmeye izin vermek. Zorluğun başlangıçta olması mantıklı olurdu, ama şimdi değiştirmek cevapları iki farklı şey yapanlara bölebiliyordu. Ve bu, bazı dillerin ele alamadığı durumlar gibi değildir. 64-bit sayılar ya da başka bir şey, burada biraz (muhtemelen) daha karmaşık bir eşleşme uygulamak anlamına gelir , bu nedenle yanlış varsayımlarla cevapları düzeltmek ve belki de bunu kontrol etmek için bir test örneği eklemek daha mantıklıdır.
sundar - Monica'yı geri

Yanıtlar:


5

R , 63 bayt

function(s,p,z=el(strsplit(s,' ')))sub(p,'',z[startsWith(z,p)])

Çevrimiçi deneyin!

Olumlu bakış arkasındaki uygulama maalesef büyük regmatches/gregexprkombinasyon nedeniyle 5 bayt daha uzun :

function(s,p)regmatches(s,gregexpr(paste0('(?<=',p,')[^ ]*'),s,,T))

2
Saf bir alt (grep ()), 66'daki gözetlemeden biraz daha iyidir, ancak yine de startsWith () 'e tecavüz etmez. Burada bir yaklaşım değişikliği olmadan gelişim için fazla yer görmüyorum. Çevrimiçi deneyin!
CriminallyVulgar


4

Japt , 9 bayt

8 baytlık bir kelime dizisi olarak girdi alabilirsek.

¸kbV msVl
¸         // Shorthand for `qS`, split into words.
 kbV      // Filter the words, selecting only those that start with the prefix.
     msVl // For each remaining word, remove prefix length chars from the start.

Çevrimiçi deneyin!


Çok güzel, ama son test senaryosu için çalışmıyor gibi görünüyor . Dizenin içindeki tırnak işaretlerinden kaynaklanıyor olabilir mi? veya yeni hatlar?
DrQuarius

@DrQuarius Son test durumunuz hatalı, değil mi? Aradığınız tüm dizeler kelimelerin ortasında (çevrili url('')), hiçbiri başlangıçta değil.
Nit


4

C (gcc), 113 109 106 105 bytes

-4 bytes thanks to @LambdaBeta!
-3 bytes thanks to @WindmillCookies!

i;f(char*s,char*t){for(i=strlen(s);*t;t++)if(!strncmp(t,s,i))for(t+=i,puts("");*t^32&&*t;)putchar(*t++);}

Try it online!


1
You can save 4 bytes by removing both ^0's. Just ;*t; and &&*t;
LambdaBeta

@LambdaBeta thanks! I missed that.
betseg

1
I was able to get it down to 107 using a different strategy, sorry :)
LambdaBeta

@LambdaBeta I actually thought of that method but I didn't think it would've been shorter than the solution that I posted. Nice answer, upvoted.
betseg

1
used puts instead of putchar, now is 107, outputs on different lines: tio.run/…
Windmill Cookies

3

Japt, 16 12 bytes

Port of Arnauld Answer

-4 bytes from @Shaggy

iS qS+V Å®¸g

iS                  Insert S value (S = " ") at beginning of first input (Implicit)
   q                split using
    S+V             S + Second input
        Å           slice 1
         ®          map
          ¸         split using S
           g        get first position

Try it online!



Should probably make mention that this is a port of Arnauld's solution. (Assuming it wasn't independently derived, of course)
Shaggy

@Shaggy Honestly I did not notice this was the same answer, anyway I'll give him credit. sorry
Luis felipe De jesus Munoz

There's a 9 byte solution if you want to give it a try.
Shaggy

@Shaggy Did you mean this or did you have something different in mind?
Nit

3

05AB1E, 11 bytes

#ʒηså}εsgF¦

Try it online! (here is a demo for multiline strings)

How does it work?

#ʒηså}εsgF¦    Full program.
#              Split the first input by spaces.
 ʒ   }         Filter the words by ...
  ηså          ... "Does the second input occur in the prefixed of the word?"
      ε        And for each valid word
       sg      Retrieve the length of the of the second input.
         F¦    And drop the first character of the word that number of times.

:) Very nice, thanks for the multiline demo! I think that was causing issues for other programs.
DrQuarius

3

Stax, 8 bytes

·B¬╤²*6&

Run and debug it

Explanation:

j{x:[fmx|- Full program, implicit input: On stack in order, 1st input in X register
j          Split string on spaces
 {   f     Filter:
  x:[        Is X a prefix?
      m    Map passing elements:
       x|-   Remove all characters in X the first time they occur in the element
             Implicit output

I could also use x%t (length of X, trim from left), which is equally long but packs to 9 bytes.


Beautiful. :) I think this might be the winner. Most of the lowest byte score contenders haven't been able to parse the third test case. :)
DrQuarius

Ahhh... but I see how you've done it now, you had to let the program know that the quotation marks in the string were not part of the program. I think that's fine. Also, yours is still the shortest regardless. :)
DrQuarius

3

Retina, 31 bytes

L`(?<=^\2¶(.|¶)*([^ ¶]+))[^ ¶]+

Try it online! First line should be the desired prefix, the rest is the input text. Does not remove duplicates. Would be 25 bytes if any white space was a valid seprator. Explanation: We want to list the suffixes of valid prefixes. The [^ ¶]+ matches the suffix itself. The prefix of the regexp is a lookbehind that ensures that the prefix of the suffix is the input prefix. As a lookbehind is evaluated right-to-left, this starts by matching the prefix (using the same pattern but inside ()s to capture it), then any characters, before finally matching the prefix on its own line at the beginning of the input.


White space meaning spaces and/or line breaks? I think that's a valid solution if so, but to be fair to all I will leave the problem as stated.
DrQuarius

@DrQuarius No, any white space includes tabs, formfeeds and even ellipses.
Neil

Retina was the first language that came to mind when I saw the post (though I don't know the language yet). I thought it would be shorter though. Could I bother you for an explanation? For eg. the docs say is a newline character, but I can't figure out why so many are needed here.
sundar - Reinstate Monica

@sundar Sorry I was in a bit of a rush at the time. The first ensures that the whole first line is matched to the prefix. The second is needed because it isn't known how many intermediate lines there are. The last two s work the same way - negated character classes normally include newlines but we don't want that here.
Neil

No problem, thanks for adding it in. "normally include newlines but we don't want that here" <- If I understand correctly, we do want that here. OP specifies strictly that only spaces count as separators, that prefixes begin at and suffixes end at spaces. So for eg. "dif\nfractional" shouldn't match for "frac" because the prefix comes after a newline, not a space. Similarly "fracture-\nrelated" should return suffix "ture-\nrelated". Which is good news here I think, because you can remove at least one , possibly more.
sundar - Reinstate Monica

3

Brachylog, 24 21 bytes

tṇ₁W&h;Wz{tR&h;.cR∧}ˢ

Try it online!

Could have been a few bytes shorter if there was variable sharing with inline predicates.

Input is an array with the prefix as the first element and the text as the second element.

tṇ₁W                    % Split the text at spaces, call that W
    &h;Wz               % Zip the prefix with each word, to give a list of pairs
         {         }ˢ   % Select the outputs where this predicate succeeds:
          tR            % Call the current word R
            &h;.c       % The prefix and the output concatenated
                 R      % should be R
                  ∧     % (No more constraints on output)

2

IBM/Lotus Notes Formula, 54 bytes

c:=@Explode(b);@Trim(@If(@Begins(c;a);@Right(c;a);""))

Takes it's input from two fields named a and b. Works because Formula will recursively apply a function to a list without the need for a @For loop.

No TIO available so here's a screenshot:

enter image description here


2

APL (Dyalog Unicode), 23 bytesSBCS

Full program. Prompts for text and prefix from stdin. Prints list to stdout.

(5'(\w+)\b',⎕)⎕S'\1'⊢⎕

Try it online!

 prompt (for text)

 yield that (separates '\1' from )

()⎕S'\1' PCRE Search and return list of capture group 1 from the following regex:

 prompt (for prefix)

'(\w+)\b', prepend this string (group of word characters followed by a word boundary)

5⌽ rotate the first 5 characters to the end; '\bPREFIX(\w+)'


2

C (clang), 107 bytes

i;f(s,t,_)char*s,*t,*_;{i=strlen(s);_=strtok(t," ");while((strncmp(_,s,i)||puts(_+i))&&(_=strtok(0," ")));}

Try it online!

Description:

i;f(s,t,_)char*s,*t,*_;{   // F takes s and t and uses i (int) and s,t,u (char*)
    i=strlen(s);           // save strlen(s) in i
    _=strtok(t," ");       // set _ to the first word of t
    while(                 // while loop
        (strncmp(_,s,i)||  // short-circuited if (if _ doesn't match s to i places)
         puts(_+i))        // print _ starting at the i'th character
        &&                 // the previous expression always returns true
        (_=strtok(0," "))) // set _ to the next word of t
    ;                      // do nothing in the actual loop
}

Has to be clang because gcc segfaults without #include <string.h> due to strtok problems.



2

MATL, 17 bytes

Yb94ih'(.*)'h6&XX

Try it on MATL Online

How?

Yb - Split the input at spaces, place the results in a cell array

94 - ASCII code for ^ character

ih - Get the input (say "frac"), concatenate '^' and the input

'(.*)'h - Push the string '(.*)' into the stack, concatenate '^frac' and '(.*)'. So now we have '^frac(.*), a regex that matches "frac" at the beginning of the string and captures whatever comes after.

6&XX - Run regexp matching, with 6& specifying 'Tokens' mode i.e., the matched capture groups are returned instead of the entire match.

Implicitly output the results.


So that's what 'Tokens' does; good to know!
Luis Mendo

1
Haha. I had no idea either, figured it out by trial and error for this answer.
sundar - Reinstate Monica


2

PowerShell 3.0, 60 62 59 bytes

param($p,$s)-split$s|%{if($_-cmatch"^$p(.*)"){$Matches[1]}}

Lost some bytes suppressing the cmatch output. Had a janky solution that gained some by purposely causing duplicates. But it also threw redlines if it didn't match on the first but that is not fine now that I think about it. +2 bytes to fix it though.


Solution with 60 bytes returns double answer in some cases king, tals, ted, tional, tional, tionally, tioned, tioned, tious, tostratic, tures,tures,tures, tures, as and show index error on He1in example. Powershell 5.1, 6.0.2. Solution with 62 bytes is Ok.
mazzy

1
@mazzy I knew that, I was just abusing the "Duplicates is allowed" bit to have it return even more duplicates when it comes across a no-match and throw red on a no-match 1st iteration.
Veskah

1

JavaScript (ES6), 57 bytes

Takes input in currying syntax (text)(prefix). Does not remove duplicates.

s=>p=>(' '+s).split(' '+p).slice(1).map(s=>s.split` `[0])

Try it online!




1

Husk, 11 bytes

Pretty much just a port of the Haskell answer:

m↓L⁰foΠz=⁰w

Try it online!

Explanation

m↓L⁰f(Πz=⁰)w  -- prefix is explicit argument ⁰, the other one implicit. eg: ⁰ = "ab" and implicit "abc def"
           w  -- words: ["abc","def"]
    f(    )   -- filter by (example w/ "abc"
       z=⁰    -- | zip ⁰ and element with equality: [1,1]
      Π       -- | product: 1
              -- : ["abc"]
m             -- map the following
 ↓            -- | drop n elements
  L⁰          -- | n being the length of ⁰ (2)
              -- : ["c"]

1

Jelly,  11  9 bytes

Ḳœṣ€ḢÐḟj€

A dyadic link accepting the text (a list of characters) on the left and the prefix (a list of characters) on the right which yields a list of lists of characters (the resulting suffixes).

Try it online! (footer joins with spaces to avoid full-program's implicit smashing)
Note: I added three edge cases to the string in the OP - unfrackled and nofracfracheremate to the beginning, which should not output and fracfracit to the end which should output fracit.

How?

Ḳœṣ€ḢÐḟj€ - Link: text, prefix                        e.g. "fracfracit unfracked", "frac"
Ḳ         - split (text) at spaces -> list of words        ["fracfracit", "unfracked"]
   €      - for each (word):
 œṣ       -   split around sublists equal to (prefix)       ["","","it"]  ["un","ked"]
     Ðḟ   - filter discard items for which this is truthy:
    Ḣ     -   head
          -   -- Crucially this modifies the list:             ["","it"]       ["ked"]
          -   -- and yields the popped item:                 ""            "un"
          -   -- and only non-empty lists are truthy:       kept          discarded
          -            ...so we end up with the list:      [["","it"]]
        € - for each (remaining list of lists of characters):
       j  -   join with the prefix                          "fracit"                                             
          -                                                ["fracit"]

previous 11 byter:

Ḳs€L}Ḣ⁼¥ƇẎ€

Also a dyadic link as above.

Try it online!


1

Perl 5 with -asE, 23 22 21 bytes (?)

say/^$b(.*)/ for@F

Try it online!

Can be run as a commandline one-liner as perl -asE 'say/^$b(.*)/ for@F' -- -b=frac -, or with a filename in place of the last -.
Or from a script file, say perl -as -M5.010 script.pl -b=frac - (thanks to @Brad Gilbert b2gills for the TIO link demonstrating this).

The code itself is 18 bytes, I added 3 bytes for the -b= option which assigns its value (the prefix input) to a variable named $b in the code. That felt like an exception to the usual "flags aren't counted" consensus.

-a splits each input line at spaces and places the result in the array @F. -s is a shortcut way of assigning a command-line argument as a variable, by giving a name on the command-line. Here the argument is -b=frac, which places the prefix "frac" in a variable $b.

/^$b(.*)/ - Matches the value of $b at the beginning of the string. .* is whatever comes after that, until the end of the word, and the surrounding parantheses capture this value. The captured values are automatically returned, to be printed by say. Iterating through space-separated words with for @F means we don't have to check for initial or final spaces.



1

Perl 6, 30 bytes

{$^t.comb: /[^|' ']$^p <(\S+/}

Test it

Expanded:

{  # bare block lambda with placeholder params $p, $t

  $^t.comb:    # find all the substrings that match the following
  /
    [ ^ | ' ' ] # beginning of string or space
    $^p        # match the prefix
    <(         # don't include anything before this
    \S+        # one or more non-space characters (suffix)
  /
}

@sundar fixed​ ​
Brad Gilbert b2gills

You seem to have an extra space between 'p' and '<' btw.
sundar - Reinstate Monica

@sundar The space between p and <( is necessary as otherwise it may be seen as $v<…> which is short for $v{qw '…'}.
Brad Gilbert b2gills

1
Seems to work without it though, at least in this case.
sundar - Reinstate Monica

1
@sundar Technically it just warns, but I don't like writing code that warns when it is only one byte different than code that doesn't warn.
Brad Gilbert b2gills

1

Java 10, 94 bytes

p->s->{for(var w:s.split(" "))if(w.startsWith(p))System.out.println(w.substring(p.length()));}

Try it online here.

Ungolfed:

p -> s -> { // lambda taking prefix and text as Strings in currying syntax
    for(var w:s.split(" ")) // split the String into words (delimited by a space); for each word ...
        if(w.startsWith(p)) //  ... test whether p is a prefix ...
            System.out.println(w.substring(p.length())); // ... if it is, output the suffix
}

1

Small Basic, 242 bytes

A Script that takes no input and outputs to the TextWindow Object

c=TextWindow.Read()
s=TextWindow.Read()
i=1
While i>0
i=Text.GetIndexOf(s," ")
w=Text.GetSubText(s,1,i)
If Text.StartsWith(w,c)Then
TextWindow.WriteLine(Text.GetSubTextToEnd(w,Text.GetLength(c)+1))
EndIf
s=Text.GetSubTextToEnd(s,i+1)
EndWhile

Try it at SmallBasic.com! Requires IE/Silverlight



1

Brachylog, 12 bytes

hṇ₁∋R&t;.cR∧

Try it online!

Takes input as [text, prefix] through the input variable, and generates each word through the output variable. This was originally sundar's answer, which I started trying to golf after reading that it "could have been a few bytes shorter if there was variable sharing with inline predicates", which is possible now. Turns out that generator output saves even more bytes.

    R           R
   ∋            is an element of
h               the first element of
                the input
 ṇ₁             split on spaces,
     &          and the input
      t         's last element
         c      concatenated
       ;        with
        .       the output variable
          R     is R
           ∧    (which is not necessarily equal to the output).

My first two attempts at golfing it down, using fairly new features of the language:

With the global variables that had been hoped for: hA⁰&tṇ₁{∧A⁰;.c?∧}ˢ (18 bytes)

With the apply-to-head metapredicate: ṇ₁ᵗz{tR&h;.cR∧}ˢ (16 bytes)

And my original solution:

Brachylog, 15 bytes

ṇ₁ʰlᵗ↙X⟨∋a₀⟩b↙X

Try it online!

Same I/O. This is essentially a generator for words with the prefix, ṇ₁ʰ⟨∋a₀⟩, modified to remove the prefix.

                   The input variable
  ʰ                with its first element replaced with itself
ṇ₁                 split on spaces
    ᵗ              has a last element
   l               the length of which
     ↙X            is X,
       ⟨   ⟩       and the output from the sandwich
       ⟨∋  ⟩       is an element of the first element of the modified input
       ⟨ a₀⟩       and has the last element of the input as a prefix.
                   The output variable
       ⟨   ⟩       is the output from the sandwich
            b      with a number of characters removed from the beginning
             ↙X    equal to X.

A very different predicate with the same byte count:

Brachylog, 15 bytes

hṇ₁∋~c₂Xh~t?∧Xt

Try it online!

Same I/O.

   ∋               An element of
h                  the first element of
                   the input variable
 ṇ₁                split on spaces
    ~c             can be un-concatenated
      ₂            into a list of two strings
       X           which we'll call X.
        h          Its first element
         ~t        is the last element of
           ?       the input variable,
            ∧      and
             Xt    its last element is
                   the output variable.


0

Pyth, 21 20 18 17 16 bytes

AQVcH)IqxNG0:NG"

Try it online!

-1 by using V instead of FN because V implicitly sets N

-2 after some further reading about string slicing options

-1 using x to check for the presence of the substring at index 0

-1 using replace with "" for getting the end of the string

I'm sure this could use some serious golfing but as a Pyth beginner, just getting it to work was a bonus.

How does it work?

assign('Q',eval_input())
assign('[G,H]',Q)
for N in num_to_range(chop(H)):
    if equal(index(N,G),0):
        imp_print(at_slice(N,G,""))

0

Excel VBA, 86 bytes

Takes input as prefix in [A1] and values in [B1] and outputs to the console.

For each w in Split([B1]):?IIf(Left(w,[Len(A1)])=[A1],Mid(w,[Len(A1)+1])+" ","");:Next
Sitemizi kullandığınızda şunları okuyup anladığınızı kabul etmiş olursunuz: Çerez Politikası ve Gizlilik Politikası.
Licensed under cc by-sa 3.0 with attribution required.