100+ |
I want to find all repeating pattersn with start with 1-2degits, contains some text, strange scharacters and numbers and finishes with 4 digits.
For example, - ref=""" 1. Lieber, C. M. The incredible shrinking circuit. Sci. Am. 285, 58±64 (2001).
-
2. Cui, Y. & Lieber, C. M. Functional nanoscale electronic devices assembled using silicon nanowire
-
building blocks. Science 291, 851±853 (2001).
-
3. Wang, J. F., Gudiksen, M. S., Duan, X. F., Cui, Y. & Lieber, C. M. Highly polarized photolumi-
-
nescence and photodetection from single indium phosphide nanowires. Science 293, 1455±1457
-
(2001)."""
-
-
quer="\d{1,2}\.\s+.+\(\d{4}\)\."
-
ansre=re.findall(quer,ref,re.DOTALL)
-
print(len(ansre))
-
It fineds only one pattern.
If i use
i find 23 patterns, but of course this captures only the beginning of the pattern, i.e. 1-2 digits.
If i add .+, i am finding only one pattern:
I do not know how to mark that any character can be between 1-2 and 4 numbers in the patters. If i use .+ to mark this, when regex captures full text, it does not stop at 4 digits.
Thank you!
| |
Share:
Expert Mod 8TB |
You need to define what you mean by "contains some text" and "strange scharacters". Those terms are too vague and I have no idea what you mean by them.
But if your goal is to break out those references. Then you can use this - \d{1,2}\.[^(]+\(\d{4}\)\.
This is assuming that your test data is representative and there are never any parentheses except for the ones around the year at the end of the reference.
| | 100+ |
The problem is that there can be parenthesis in the text.
I do not know how to define "any character" between
and
. If i use
it captures all text, but i need to capture as many lines as there are starting with
and ending with | | 100+ |
I have tried this: - \d+\.\s+[\sa-zA-Z0-9±&,:;.)(-]+
but it captures only 3 lines of 21 (Number 3, 11, 15 from the text below). DO not know why. Full text to be searched looks like this: - ref="""
-
1. Haraguchi, K., Katsuyama, T., Hiruma, K. & Ogawa, K. GaAs p-n junction formed in quantum wire
-
crystals. Appl. Phys. Lett. 60, 745 747 (1992).
-
2. Björk, M. T. et al. Nanowire resonant tunneling diodes. App. Phys.Lett. 81, 4458±4460 (2002).
-
3. Thelander, C. et al. Single electron transistors in heterostructure nanowires. Appl. Phys. Lett. 83,
-
2052±2054 (2003).
-
4. Wagner,R. S.in Whisker Technology (ed. Levitt, A. P.) 47±119 (Wiley. New York, 1970)
-
5. Hiruma, K. et al. Growth and optical properties of nanometer scale GaAs and InAs whiskers. J. Appl.
-
Phys. 77, 447±462 (1995).
-
6. Duan, X. & Lieber, C. M. General synthesis of compound semiconductor nanowires. Adv. Mater. 12,
-
298±302 (2000).
-
7. Duan, X. & Lieber, C. M. Laser assisted catalytic growth of single crystal GaN nanowires. J. Am. Chem.
-
Soc. 122, 188±189 (2000).
-
8. Ohlsson, B. J. et al. Size , shape , and position controlled GaAs nano whiskers. Appl. Phys. Lett. 79,
-
3335±3337 (2001).
-
9. Kamins, T. I., Stanley Williams, R., Basile, D. P., Hesjedal, T. & Harris, J. S. Ti catalyzed Si nanowires by
-
chemical vapor deposition: Microscopy and growth mechanisms. J. Appl. Phys. 89, 1008±1016 (2001).
-
10. Ohlsson, B. J. et al. Growth and characterization of GaAs and InAs nano whiskers and InAs/GaAs
-
heterostructures. Physica E 13, 1126±1130 (2002).
-
11. Buffat, P. & Borel, J. P. Size effect on the melting temperature of gold particles. Phys. Rev. A 13,
-
2287±2298 (1976).
-
12. Björk, M. T. et al. One dimensional steeplechase for electrons realized. Nano Lett. 2, 87±89 (2002).
-
13. Gudiksen, M. S., Lauhon, L. J., Wang, J., Smith, D. S. & Lieber, C. M. Growth of nanowires superlattice
-
structures for nanoscale photonics and electronics. Nature 415, 617±620 (2002).
-
14. Wu, Y., Fan, R. & Yang, P. Block by block growth of single crystalline Si/SiGe superlattice nanowires.
-
Nano Lett. 2, 83±86 (2002).
-
15. Baker, R. T. K. Catalytic growth of carbon filaments. Carbon 27, 315±323 (1989).
-
16. Helveg, S. et al. Atomic scale imaging of carbon nanofibre growth. Nature 427, 426±429 (2004).
-
17. Massalski, T. B. (ed.) Binary Alloy Phase Diagrams 2nd edn Vol. 1 369±371 (ASM International,
-
Materials Park, Ohio, 1990).
-
18. Gupta, R. P., Khokle, W. S., Wuerfl, J. & Hartnagel, H. L. Diffusion of gallium in thin gold films on
-
GaAs. Thin Solid Films. 151, L121±L125 (1987).
-
19. Massalski, T. B. (ed.) Binary Alloy Phase Diagrams 1st edn Vol. 1 191±192 (ASM International, Metals
-
Park, Ohio, 1986).
-
20. Magnusson, M. H., Deppert, K., Malm, J. O., Bovin, J. O. & Samuelson, L. Gold nanoparticles:production, reshaping, and thermal charging. J. Nanoparticle Res. 1, 243±251 (1999).
-
21. Bakkers, E. & Verheijen, M. A. Synthesis of InP nanotubes. J. Am. Chem. Soc. 125, 3440±3441 (2003). """
| | 100+ |
I find out why regex captured lines 3,11,15. Because they contains /
If i use - \d+\.\s+[\sa-zA-Z0-9/±&,:;.)(öÖäÄåÅ-]*\d{4}\)\.
all text is captured, the same as if i would be using
SO, question remains , how to define any character between other two sets of characters? In my case i want to capture 21 lines.
| | Expert Mod 8TB |
By default, regex uses greedy matching. That's why it doesn't stop after the first occurrence of a match. You need to tell it be be non-greedy.
You can do that by putting a ? after the qualifiers: * . + ? | | Post your reply Sign in to post your reply or Sign up for a free account.
Similar topics
16 posts
views
Thread by Henri Schomäcker |
last post: by
|
reply
views
Thread by Stephen |
last post: by
|
reply
views
Thread by JoseTA |
last post: by
|
5 posts
views
Thread by Christopher Walsh |
last post: by
|
4 posts
views
Thread by Dave |
last post: by
|
5 posts
views
Thread by Fritz Switzer |
last post: by
|
1 post
views
Thread by albert_reade@yahoo.com |
last post: by
|
2 posts
views
Thread by sklett |
last post: by
|
2 posts
views
Thread by tawright915 |
last post: by
| | | | | | | | | | | |